Applying big data beyond small problems in climate research
Commercial success of big data has led to speculation that big-data-like reasoning could partly replace theory-based approaches in science. Big data typically has been applied to ‘small problems’, which are well-structured cases characterized by repeated evaluation of predictions. Here, we show that in climate research, intermediate categories exist between classical domain science and big data, and that big-data elements have also been applied without the possibility of repeated evaluation. Big-data elements can be useful for climate research beyond small problems if combined with more traditional approaches based on domain-specific knowledge. The biggest potential for big-data elements, we argue, lies in socioeconomic climate research.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others

A topography of climate change research
Article 27 January 2020

Accountability and data-driven urban climate governance
Article 16 November 2020

The missing risks of climate change
Article 26 October 2022
Change history
References
- Mayer-Schönberger, V. & Cukier, K. Big Data: A Revolution that Will Transform How We Live, Work and Think (John Murray, London, 2013).
- Lyon, A. Data. in The Oxford Handbook of the Philosophy of Science (ed. Humphreys, P.) 738–758 (Oxford Univ. Press, Oxford, 2015).
- Pietsch, W. & Wernecke, J. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (eds Pietsch, W., Wernecke, J. & Ott, M.) 37–57 (Springer VS, Wiesbaden, 2017).
- Karpatne, A. et al. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng.29, 2318–2331 (2017). This paper introduces a framework for applying data science tools in scientific research and guiding the analysis by theory in order to ensure that the results are physically plausible. ArticleGoogle Scholar
- Faghmous, J. H. & Kumar, V. A big data guide to understanding climate change: The case for theory-guided data science. Big Data2, 155–163 (2014). ArticleGoogle Scholar
- Ford, J. D. et al. Big data has big potential for applications to climate change adaptation. Proc. Natl Acad. Sci. USA113, 10729–10732 (2016). This opinion paper makes the case for the increasing use of big data in research and decision making on climate change adaptation. ArticleCASGoogle Scholar
- Overpeck, J. T., Meehl, G. A., Bony, S. & Easterling, D. R. Climate data challenges in the 21st century. Science331, 700–702 (2011). ArticleCASGoogle Scholar
- Caldwell, P. M. et al. Statistical significance of climate sensitivity predictors obtained by data mining. Geophys. Res. Lett.41, 1803–1808 (2014). ArticleGoogle Scholar
- Kryvasheyeu, Y. et al. Rapid assessment of disaster damage using social media activity. Sci. Adv.2, e1500779 (2016). ArticleGoogle Scholar
- Sprenger, M., Schemm, S., Oechslin, R. & Jenkner, J. Nowcasting Foehn wind events using the AdaBoost machine learning algorithm. Weather Forecast.32, 1079–1099 (2017). ArticleGoogle Scholar
- Baumberger, C., Knutti, R. & Hirsch Hadorn, G. Building confidence in climate model projections: an analysis of inferences from fit. Wiley Interdiscip. Rev. Clim. Change8, e454 (2017). This article introduces a conceptual framework to assess the adequacy of climate models for projections and highlights the importance of the coherence with background knowledge. ArticleGoogle Scholar
- Boyd, D. & Crawford, K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc.15, 662–679 (2012). ArticleGoogle Scholar
- De Mauro, A., Greco, M. & Grimaldi, M. A formal definition of Big Data based on its essential features. Libr. Rev.65, 122–135 (2016). ArticleGoogle Scholar
- Kitchin, R. & McArdle, G. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc.3, 1–10 (2016).This paper discusses characteristics of datasets typically associated with big data and illustrates the lack of terminological clarity around big data. ArticleGoogle Scholar
- Lukoianova, T. & Rubin, V. L. Veracity roadmap: Is big data objective, truthful and credible?. Adv. Classif. Res. Online24, 4–15 (2014). ArticleGoogle Scholar
- Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2008).
- LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015). ArticleCASGoogle Scholar
- Linden, G., Smith, B. & York, J. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput.7, 76–80 (2003). ArticleGoogle Scholar
- Goertzel, B. & Pennachin, C. Artificial General Intelligence (Springer, Berlin Heidelberg, 2007).
- Manogaran, G. & Lopez, D. Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng.65, 207–221 (2018). ArticleGoogle Scholar
- Manogaran, G., Lopez, D. & Chilamkurti, N. In-Mapper combiner based MapReduce algorithm for processing of big climate data. Future Gener. Comput. Syst.86, 433–445 (2018). ArticleGoogle Scholar
- McGuffie, K. & Henderson-Sellers, A. A Climate Modelling Primer (John Wiley & Sons, Chichester, 2005).
- Müller, P. Constructing climate knowledge with computer models. Wiley Interdiscip. Rev. Clim. Change1, 565–580 (2010). ArticleGoogle Scholar
- Knutti, R. Should we believe model predictions of future climate change? Philos. Trans. R. Soc. Math. Phys. Eng. Sci.366, 4647–4664 (2008). ArticleGoogle Scholar
- Krasnopolsky, V. M. & Fox-Rabinovitz, M. S. Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw.19, 122–134 (2006). ArticleGoogle Scholar
- Tripathi, S., Srinivas, V. V. & Nanjundiah, R. S. Downscaling of precipitation for climate change scenarios: A support vector machine approach. J. Hydrol.330, 621–640 (2006). ArticleGoogle Scholar
- Chadwick, R., Coppola, E. & Giorgi, F. An artificial neural network technique for downscaling GCM outputs to RCM spatial scale. Nonlinear Process. Geophys.18, 1013–1028 (2011). ArticleGoogle Scholar
- Tavakol-Davani, H., Nasseri, M. & Zahraie, B. Improved statistical downscaling of daily precipitation using SDSM platform and data-mining methods. Int. J. Climatol.33, 2561–2578 (2013). ArticleGoogle Scholar
- Nasseri, M., Tavakol-Davani, H. & Zahraie, B. Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J. Hydrol.492, 1–14 (2013). ArticleGoogle Scholar
- Abbot, J. & Marohasy, J. Application of artificial neural networks to rainfall forecasting in Queensland, Australia. Adv. Atmospheric Sci.29, 717–730 (2012). ArticleGoogle Scholar
- Abbot, J. & Marohasy, J. Input selection and optimisation for monthly rainfall forecasting in Queensland, Australia, using artificial neural networks. Atmospheric Res.138, 166–178 (2014). ArticleGoogle Scholar
- Deo, R. C. & Şahin, M. Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia. Atmospheric Res.153, 512–525 (2015). ArticleGoogle Scholar
- Tapia, C. et al. Profiling urban vulnerabilities to climate change: An indicator-based vulnerability assessment for European cities. Ecol. Indic.78, 142–155 (2017). ArticleGoogle Scholar
- Shelton, T., Poorthuis, A., Graham, M. & Zook, M. Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum52, 167–179 (2014). ArticleGoogle Scholar
- Castelli, R. et al. In Proc. 114th Eur. Study Group Math. Industry 25–43 (2016); https://www.swi-wiskunde.nl/swi2016/wp-content/uploads/sites/3/2017/01/swi2016scientificproceedings.pdf
- Overeem, A. et al. Crowdsourcing urban air temperatures from smartphone battery temperatures. Geophys. Res. Lett.40, 4081–4085 (2013). ArticleGoogle Scholar
- Elmore, K. L. et al. MPING: Crowd-sourcing weather reports for research. Bull. Am. Meteorol. Soc.95, 1335–1342 (2014). ArticleGoogle Scholar
- Muller, C. L. et al. Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int. J. Climatol.35, 3185–3203 (2015). ArticleGoogle Scholar
- Bunn, C., Läderach, P., Ovalle Rivera, O. & Kirschke, D. A bitter cup: climate change profile of global production of Arabica and Robusta coffee. Clim. Change129, 89–101 (2015). ArticleGoogle Scholar
- Foley, A. M., Leahy, P. G., Marvuglia, A. & McKeogh, E. J. Current methods and advances in forecasting of wind power generation. Renew. Energy37, 1–8 (2012). ArticleGoogle Scholar
- Inman, R. H., Pedro, H. T. C. & Coimbra, C. F. M. Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci.39, 535–576 (2013). ArticleGoogle Scholar
- Ghosh, S. & Mujumdar, P. P. Statistical downscaling of GCM simulations to streamflow using relevance vector machine. Adv. Water Resour.31, 132–146 (2008). ArticleGoogle Scholar
- Mendes, D. & Marengo, J. A. Temporal downscaling: a comparison between artificial neural network and autocorrelation techniques over the Amazon Basin in present and future climate change scenarios. Theor. Appl. Climatol.100, 413–421 (2010). ArticleGoogle Scholar
- Chen, S.-T., Yu, P.-S. & Tang, Y.-H. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol.385, 13–22 (2010). ArticleGoogle Scholar
- Raje, D. & Mujumdar, P. P. A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrol. Process.25, 3575–3589 (2011). ArticleGoogle Scholar
- Pietsch, W. The causal nature of modeling with big data. Philos. Technol.29, 137–171 (2016).This philosophical paper argues that the predictive ability of machine learning tools is rooted in causality and not just correlations. ArticleGoogle Scholar
- Masson, D. & Knutti, R. Predictor screening, calibration, and observational constraints in climate model ensembles: An illustration using climate sensitivity. J. Clim.26, 887–898 (2013). ArticleGoogle Scholar
- Lu, X. et al. Detecting climate adaptation with mobile network data in Bangladesh: anomalies in communication, mobility and consumption patterns during cyclone Mahasen. Clim. Change138, 505–519 (2016). ArticleGoogle Scholar
- Welker, C. et al. Modelling economic losses of historic and present-day high-impact winter windstorms in Switzerland. Tellus Dyn. Meteorol. Oceanogr.68, 29546 (2016). ArticleGoogle Scholar
- Arbuthnott, K., Hajat, S., Heaviside, C. & Vardoulakis, S. Changes in population susceptibility to heat and cold over time: assessing adaptation to climate change. Environ. Health15(Suppl. 1), 73–93 (2016). Google Scholar
- Vaughan, C. & Dessai, S. Climate services for society: origins, institutional arrangements, and design elements for an evaluation framework: Climate services for society. Wiley Interdiscip. Rev. Clim. Change5, 587–603 (2014). ArticleGoogle Scholar
- Benestad, R., Parding, K., Dobler, A. & Mezghani, A. A strategy to effectively make use of large volumes of climate data for climate change adaptation. Clim. Serv.6, 48–54 (2017). ArticleGoogle Scholar
- Wahabzada, M. et al. Plant phenotyping using probabilistic topic models: Uncovering the hyperspectral language of plants. Sci. Rep.6, 22482 (2016). ArticleCASGoogle Scholar
- Walter, A., Finger, R., Huber, R. & Buchmann, N. Smart farming is key to developing sustainable agriculture. Proc. Natl Acad. Sci. USA114, 6148–6150 (2017). ArticleCASGoogle Scholar
- Lipper, L. et al. Climate-smart agriculture for food security. Nat. Clim. Change4, 1068–1072 (2014). ArticleGoogle Scholar
- Katzav, J. & Parker, W. S. The future of climate modeling. Clim. Change132, 475–487 (2015). ArticleGoogle Scholar
- Schneider, T., Lan, S., Stuart, A. & Teixeira, J. Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett.44, 12396–12417 (2017). This paper argues that parameterizations in Earth system models can be improved with the help of observations and data science tools, including machine learning. ArticleGoogle Scholar
- Wenzel, M. & Schröter, J. Reconstruction of regional mean sea level anomalies from tide gauges using neural networks. J. Geophys. Res. Oceans115, C08013- 1–15 (2010). ArticleGoogle Scholar
- Gagne II, D. J., McGovern, A., Basara, J. B. & Brown, R. A. Tornadic supercell environments analyzed using surface and reanalysis data: A spatiotemporal relational data-mining approach. J. Appl. Meteorol. Climatol.51, 2203–2217 (2012). ArticleGoogle Scholar
- Rasouli, K., Hsieh, W. W. & Cannon, A. J. Daily streamflow forecasting by machine learning methods with weather and climate inputs. J. Hydrol.414–415, 284–293 (2012). ArticleGoogle Scholar
- Mekanik, F., Imteaz, M. A., Gato-Trinidad, S. & Elmahdi, A. Multiple regression and artificial neural network for long-term rainfall forecasting using large scale climate modes. J. Hydrol.503, 11–21 (2013). ArticleGoogle Scholar
- Merz, B., Kreibich, H. & Lall, U. Multi-variate flood damage assessment: a tree-based data-mining approach. Nat. Hazards Earth Syst. Sci.13, 53–64 (2013). ArticleGoogle Scholar
- McGovern, A., Gagne II, D. J., Williams, J. K., Brown, R. A. & Basara, J. B. Enhancing understanding and improving prediction of severe weather through spatiotemporal relational learning. Mach. Learn.95, 27–50 (2014). ArticleGoogle Scholar
- Abbot, J. & Marohasy, J. Using artificial intelligence to forecast monthly rainfall under present and future climates for the bowen basin, Queensland, Australia. Int. J. Sustain. Dev. Plan.10, 66–75 (2015). ArticleGoogle Scholar
- Mohammadi, K. et al. Extreme learning machine based prediction of daily dew point temperature. Comput. Electron. Agric.117, 214–225 (2015). ArticleGoogle Scholar
- Patil, A. P. & Deka, P. C. An extreme learning machine approach for modeling evapotranspiration using extrinsic inputs. Comput. Electron. Agric.121, 385–392 (2016). ArticleGoogle Scholar
- Salcedo-Sanz, S., Deo, R. C., Carro-Calvo, L. & Saavedra-Moreno, B. Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms. Theor. Appl. Climatol.125, 13–25 (2016). ArticleGoogle Scholar
- Andersen, H., Cermak, J., Fuchs, J., Knutti, R. & Lohmann, U. Understanding the drivers of marine liquid-water cloud occurrence and properties with global observations using neural networks. Atmospheric Chem. Phys.17, 9535–9546 (2017). ArticleCASGoogle Scholar
- Das, S., Chakraborty, R. & Maitra, A. A random forest algorithm for nowcasting of intense precipitation events. Adv. Space Res.60, 1271–1282 (2017). ArticleCASGoogle Scholar
- Dayal, K., Deo, R. & Apan, A. A. In Climate Change Adaptation in Pacific Countries: Fostering Resilience and Improving the Quality of Life (ed. Leal Filho, W.) 177–198 (Springer International Publishing, Cham, 2017).
- Eghdamirad, S., Johnson, F. & Sharma, A. Using second-order approximation to incorporate GCM uncertainty in climate change impact assessments. Clim. Change142, 37–52 (2017). ArticleGoogle Scholar
- Majdzadeh Moghadam, F. Neural network-based approach for identification of meteorological factors affecting regional sea-level anomalies. J. Hydrol. Eng.22, 04016058-1–15 (2017). ArticleGoogle Scholar
- Kashiwao, T. et al. A neural network-based local rainfall prediction system using meteorological data on the internet: A case study using data from the Japan Meteorological Agency. Appl. Soft Comput.56, 317–330 (2017). ArticleGoogle Scholar
- Park, S., Im, J., Park, S. & Rhee, J. Drought monitoring using high resolution soil moisture through multi-sensor satellite data fusion over the Korean peninsula. Agric. For. Meteorol.237–238, 257–269 (2017). ArticleGoogle Scholar
- Rahmati, O. & Pourghasemi, H. R. Identification of critical flood prone areas in data-scarce and ungauged regions: A comparison of three data mining models. Water Resour. Manage.31, 1473–1487 (2017). ArticleGoogle Scholar
- Roodposhti, M. S., Safarrad, T. & Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmospheric Res.193, 73–82 (2017). ArticleGoogle Scholar
- Wu, J. et al. Establishing and assessing the Integrated Surface Drought Index (ISDI) for agricultural drought monitoring in mid-eastern China. Int. J. Appl. Earth Obs. Geoinformation23, 397–410 (2013). ArticleGoogle Scholar
- Zhou, L. et al. Quantitative and detailed spatiotemporal patterns of drought in China during 2001–2013. Sci. Total Environ.589, 136–145 (2017). ArticleCASGoogle Scholar
- Jones, G. D. et al. Selenium deficiency risk predicted to increase under future climate change. Proc. Natl Acad. Sci. USA114, 2848–2853 (2017). ArticleCASGoogle Scholar
- Tkachenko, N., Jarvis, S. & Procter, R. Predicting floods with Flickr tags. PLOS ONE12, e0172870 (2017). ArticleGoogle Scholar
- Preis, T., Moat, H. S., Bishop, S. R., Treleaven, P. & Stanley, H. E. Quantifying the digital traces of Hurricane Sandy on Flickr. Sci. Rep.3, 3141 (2013). ArticleGoogle Scholar
Acknowledgements
We thank C. Beisbart, A. Merrifield, S. Sippel, R. McMahon and J. Lilliestam for discussions and comments that have improved the quality of this manuscript. The research was supported by the Swiss National Science Foundation, National Research Programme 75 Big Data, project no. 167215.
Author information
Authors and Affiliations
- Institute for Environmental Decisions, ETH Zurich, Switzerland Benedikt Knüsel, Marius Zumwald, Christoph Baumberger, Gertrude Hirsch Hadorn & David N. Bresch
- Institute for Atmospheric and Climate Science, ETH Zurich, Switzerland Benedikt Knüsel, Marius Zumwald, Erich M. Fischer & Reto Knutti
- Federal Office of Meteorology and Climatology MeteoSwiss, Zurich, Switzerland David N. Bresch
- Benedikt Knüsel