Abstract
Machine learning models for water quality prediction often face challenges due to insufficient data and uneven spatial-temporal distributions. To address these issues, we introduce a framework combining machine learning, numerical modeling, and remote sensing imagery to predict coastal water turbidity, a key water quality proxy. This approach was tested in the Great Lakes region, specifically Cleveland Harbor, Lake Erie. We trained models using observed and synthetic data from 3D numerical models and tested them against in situ and remote sensing data from PlanetLabs’ Dove satellites. High-resolution (HR) data improved prediction accuracy, with RMSE values of 0.154 and 0.146, log10(FNU) and \(R^2\) values of 0.92 and 0.93 for validation and test datasets, respectively. Our study highlights the importance of unified turbidity measures for data comparability. The machine learning model demonstrated skill in predicting turbidity through transfer learning, indicating applicability in diverse, data-scarce regions. This approach can enhance decision support systems for coastal environments by providing accurate, timely predictions of water quality variables. Our methodology offers robust strategies for turbidity and water quality monitoring and holds significant potential for improving input data quality for numerical models and developing predictive models from remote sensing data.