Applying big data beyond small problems in climate research

Commercial success of big data has led to speculation that big-data-like reasoning could partly replace theory-based approaches in science. Big data typically has been applied to ‘small problems’, which are well-structured cases characterized by repeated evaluation of predictions. Here, we show that in climate research, intermediate categories exist between classical domain science and big data, and that big-data elements have also been applied without the possibility of repeated evaluation. Big-data elements can be useful for climate research beyond small problems if combined with more traditional approaches based on domain-specific knowledge. The biggest potential for big-data elements, we argue, lies in socioeconomic climate research.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

206,07 € per year

only 17,17 € per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

A topography of climate change research

Article 27 January 2020

Accountability and data-driven urban climate governance

Article 16 November 2020

The missing risks of climate change

Article 26 October 2022

Change history

References

  1. Mayer-Schönberger, V. & Cukier, K. Big Data: A Revolution that Will Transform How We Live, Work and Think (John Murray, London, 2013).
  2. Lyon, A. Data. in The Oxford Handbook of the Philosophy of Science (ed. Humphreys, P.) 738–758 (Oxford Univ. Press, Oxford, 2015).
  3. Pietsch, W. & Wernecke, J. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (eds Pietsch, W., Wernecke, J. & Ott, M.) 37–57 (Springer VS, Wiesbaden, 2017).
  4. Karpatne, A. et al. Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Trans. Knowl. Data Eng.29, 2318–2331 (2017). This paper introduces a framework for applying data science tools in scientific research and guiding the analysis by theory in order to ensure that the results are physically plausible. ArticleGoogle Scholar
  5. Faghmous, J. H. & Kumar, V. A big data guide to understanding climate change: The case for theory-guided data science. Big Data2, 155–163 (2014). ArticleGoogle Scholar
  6. Ford, J. D. et al. Big data has big potential for applications to climate change adaptation. Proc. Natl Acad. Sci. USA113, 10729–10732 (2016). This opinion paper makes the case for the increasing use of big data in research and decision making on climate change adaptation. ArticleCASGoogle Scholar
  7. Overpeck, J. T., Meehl, G. A., Bony, S. & Easterling, D. R. Climate data challenges in the 21st century. Science331, 700–702 (2011). ArticleCASGoogle Scholar
  8. Caldwell, P. M. et al. Statistical significance of climate sensitivity predictors obtained by data mining. Geophys. Res. Lett.41, 1803–1808 (2014). ArticleGoogle Scholar
  9. Kryvasheyeu, Y. et al. Rapid assessment of disaster damage using social media activity. Sci. Adv.2, e1500779 (2016). ArticleGoogle Scholar
  10. Sprenger, M., Schemm, S., Oechslin, R. & Jenkner, J. Nowcasting Foehn wind events using the AdaBoost machine learning algorithm. Weather Forecast.32, 1079–1099 (2017). ArticleGoogle Scholar
  11. Baumberger, C., Knutti, R. & Hirsch Hadorn, G. Building confidence in climate model projections: an analysis of inferences from fit. Wiley Interdiscip. Rev. Clim. Change8, e454 (2017). This article introduces a conceptual framework to assess the adequacy of climate models for projections and highlights the importance of the coherence with background knowledge. ArticleGoogle Scholar
  12. Boyd, D. & Crawford, K. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc.15, 662–679 (2012). ArticleGoogle Scholar
  13. De Mauro, A., Greco, M. & Grimaldi, M. A formal definition of Big Data based on its essential features. Libr. Rev.65, 122–135 (2016). ArticleGoogle Scholar
  14. Kitchin, R. & McArdle, G. What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets. Big Data Soc.3, 1–10 (2016).This paper discusses characteristics of datasets typically associated with big data and illustrates the lack of terminological clarity around big data. ArticleGoogle Scholar
  15. Lukoianova, T. & Rubin, V. L. Veracity roadmap: Is big data objective, truthful and credible?. Adv. Classif. Res. Online24, 4–15 (2014). ArticleGoogle Scholar
  16. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, New York, 2008).
  17. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015). ArticleCASGoogle Scholar
  18. Linden, G., Smith, B. & York, J. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput.7, 76–80 (2003). ArticleGoogle Scholar
  19. Goertzel, B. & Pennachin, C. Artificial General Intelligence (Springer, Berlin Heidelberg, 2007).
  20. Manogaran, G. & Lopez, D. Spatial cumulative sum algorithm with big data analytics for climate change detection. Comput. Electr. Eng.65, 207–221 (2018). ArticleGoogle Scholar
  21. Manogaran, G., Lopez, D. & Chilamkurti, N. In-Mapper combiner based MapReduce algorithm for processing of big climate data. Future Gener. Comput. Syst.86, 433–445 (2018). ArticleGoogle Scholar
  22. McGuffie, K. & Henderson-Sellers, A. A Climate Modelling Primer (John Wiley & Sons, Chichester, 2005).
  23. Müller, P. Constructing climate knowledge with computer models. Wiley Interdiscip. Rev. Clim. Change1, 565–580 (2010). ArticleGoogle Scholar
  24. Knutti, R. Should we believe model predictions of future climate change? Philos. Trans. R. Soc. Math. Phys. Eng. Sci.366, 4647–4664 (2008). ArticleGoogle Scholar
  25. Krasnopolsky, V. M. & Fox-Rabinovitz, M. S. Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw.19, 122–134 (2006). ArticleGoogle Scholar
  26. Tripathi, S., Srinivas, V. V. & Nanjundiah, R. S. Downscaling of precipitation for climate change scenarios: A support vector machine approach. J. Hydrol.330, 621–640 (2006). ArticleGoogle Scholar
  27. Chadwick, R., Coppola, E. & Giorgi, F. An artificial neural network technique for downscaling GCM outputs to RCM spatial scale. Nonlinear Process. Geophys.18, 1013–1028 (2011). ArticleGoogle Scholar
  28. Tavakol-Davani, H., Nasseri, M. & Zahraie, B. Improved statistical downscaling of daily precipitation using SDSM platform and data-mining methods. Int. J. Climatol.33, 2561–2578 (2013). ArticleGoogle Scholar
  29. Nasseri, M., Tavakol-Davani, H. & Zahraie, B. Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J. Hydrol.492, 1–14 (2013). ArticleGoogle Scholar
  30. Abbot, J. & Marohasy, J. Application of artificial neural networks to rainfall forecasting in Queensland, Australia. Adv. Atmospheric Sci.29, 717–730 (2012). ArticleGoogle Scholar
  31. Abbot, J. & Marohasy, J. Input selection and optimisation for monthly rainfall forecasting in Queensland, Australia, using artificial neural networks. Atmospheric Res.138, 166–178 (2014). ArticleGoogle Scholar
  32. Deo, R. C. & Şahin, M. Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia. Atmospheric Res.153, 512–525 (2015). ArticleGoogle Scholar
  33. Tapia, C. et al. Profiling urban vulnerabilities to climate change: An indicator-based vulnerability assessment for European cities. Ecol. Indic.78, 142–155 (2017). ArticleGoogle Scholar
  34. Shelton, T., Poorthuis, A., Graham, M. & Zook, M. Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum52, 167–179 (2014). ArticleGoogle Scholar
  35. Castelli, R. et al. In Proc. 114th Eur. Study Group Math. Industry 25–43 (2016); https://www.swi-wiskunde.nl/swi2016/wp-content/uploads/sites/3/2017/01/swi2016scientificproceedings.pdf
  36. Overeem, A. et al. Crowdsourcing urban air temperatures from smartphone battery temperatures. Geophys. Res. Lett.40, 4081–4085 (2013). ArticleGoogle Scholar
  37. Elmore, K. L. et al. MPING: Crowd-sourcing weather reports for research. Bull. Am. Meteorol. Soc.95, 1335–1342 (2014). ArticleGoogle Scholar
  38. Muller, C. L. et al. Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int. J. Climatol.35, 3185–3203 (2015). ArticleGoogle Scholar
  39. Bunn, C., Läderach, P., Ovalle Rivera, O. & Kirschke, D. A bitter cup: climate change profile of global production of Arabica and Robusta coffee. Clim. Change129, 89–101 (2015). ArticleGoogle Scholar
  40. Foley, A. M., Leahy, P. G., Marvuglia, A. & McKeogh, E. J. Current methods and advances in forecasting of wind power generation. Renew. Energy37, 1–8 (2012). ArticleGoogle Scholar
  41. Inman, R. H., Pedro, H. T. C. & Coimbra, C. F. M. Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci.39, 535–576 (2013). ArticleGoogle Scholar
  42. Ghosh, S. & Mujumdar, P. P. Statistical downscaling of GCM simulations to streamflow using relevance vector machine. Adv. Water Resour.31, 132–146 (2008). ArticleGoogle Scholar
  43. Mendes, D. & Marengo, J. A. Temporal downscaling: a comparison between artificial neural network and autocorrelation techniques over the Amazon Basin in present and future climate change scenarios. Theor. Appl. Climatol.100, 413–421 (2010). ArticleGoogle Scholar
  44. Chen, S.-T., Yu, P.-S. & Tang, Y.-H. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol.385, 13–22 (2010). ArticleGoogle Scholar
  45. Raje, D. & Mujumdar, P. P. A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrol. Process.25, 3575–3589 (2011). ArticleGoogle Scholar
  46. Pietsch, W. The causal nature of modeling with big data. Philos. Technol.29, 137–171 (2016).This philosophical paper argues that the predictive ability of machine learning tools is rooted in causality and not just correlations. ArticleGoogle Scholar
  47. Masson, D. & Knutti, R. Predictor screening, calibration, and observational constraints in climate model ensembles: An illustration using climate sensitivity. J. Clim.26, 887–898 (2013). ArticleGoogle Scholar
  48. Lu, X. et al. Detecting climate adaptation with mobile network data in Bangladesh: anomalies in communication, mobility and consumption patterns during cyclone Mahasen. Clim. Change138, 505–519 (2016). ArticleGoogle Scholar
  49. Welker, C. et al. Modelling economic losses of historic and present-day high-impact winter windstorms in Switzerland. Tellus Dyn. Meteorol. Oceanogr.68, 29546 (2016). ArticleGoogle Scholar
  50. Arbuthnott, K., Hajat, S., Heaviside, C. & Vardoulakis, S. Changes in population susceptibility to heat and cold over time: assessing adaptation to climate change. Environ. Health15(Suppl. 1), 73–93 (2016). Google Scholar
  51. Vaughan, C. & Dessai, S. Climate services for society: origins, institutional arrangements, and design elements for an evaluation framework: Climate services for society. Wiley Interdiscip. Rev. Clim. Change5, 587–603 (2014). ArticleGoogle Scholar
  52. Benestad, R., Parding, K., Dobler, A. & Mezghani, A. A strategy to effectively make use of large volumes of climate data for climate change adaptation. Clim. Serv.6, 48–54 (2017). ArticleGoogle Scholar
  53. Wahabzada, M. et al. Plant phenotyping using probabilistic topic models: Uncovering the hyperspectral language of plants. Sci. Rep.6, 22482 (2016). ArticleCASGoogle Scholar
  54. Walter, A., Finger, R., Huber, R. & Buchmann, N. Smart farming is key to developing sustainable agriculture. Proc. Natl Acad. Sci. USA114, 6148–6150 (2017). ArticleCASGoogle Scholar
  55. Lipper, L. et al. Climate-smart agriculture for food security. Nat. Clim. Change4, 1068–1072 (2014). ArticleGoogle Scholar
  56. Katzav, J. & Parker, W. S. The future of climate modeling. Clim. Change132, 475–487 (2015). ArticleGoogle Scholar
  57. Schneider, T., Lan, S., Stuart, A. & Teixeira, J. Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett.44, 12396–12417 (2017). This paper argues that parameterizations in Earth system models can be improved with the help of observations and data science tools, including machine learning. ArticleGoogle Scholar
  58. Wenzel, M. & Schröter, J. Reconstruction of regional mean sea level anomalies from tide gauges using neural networks. J. Geophys. Res. Oceans115, C08013- 1–15 (2010). ArticleGoogle Scholar
  59. Gagne II, D. J., McGovern, A., Basara, J. B. & Brown, R. A. Tornadic supercell environments analyzed using surface and reanalysis data: A spatiotemporal relational data-mining approach. J. Appl. Meteorol. Climatol.51, 2203–2217 (2012). ArticleGoogle Scholar
  60. Rasouli, K., Hsieh, W. W. & Cannon, A. J. Daily streamflow forecasting by machine learning methods with weather and climate inputs. J. Hydrol.414–415, 284–293 (2012). ArticleGoogle Scholar
  61. Mekanik, F., Imteaz, M. A., Gato-Trinidad, S. & Elmahdi, A. Multiple regression and artificial neural network for long-term rainfall forecasting using large scale climate modes. J. Hydrol.503, 11–21 (2013). ArticleGoogle Scholar
  62. Merz, B., Kreibich, H. & Lall, U. Multi-variate flood damage assessment: a tree-based data-mining approach. Nat. Hazards Earth Syst. Sci.13, 53–64 (2013). ArticleGoogle Scholar
  63. McGovern, A., Gagne II, D. J., Williams, J. K., Brown, R. A. & Basara, J. B. Enhancing understanding and improving prediction of severe weather through spatiotemporal relational learning. Mach. Learn.95, 27–50 (2014). ArticleGoogle Scholar
  64. Abbot, J. & Marohasy, J. Using artificial intelligence to forecast monthly rainfall under present and future climates for the bowen basin, Queensland, Australia. Int. J. Sustain. Dev. Plan.10, 66–75 (2015). ArticleGoogle Scholar
  65. Mohammadi, K. et al. Extreme learning machine based prediction of daily dew point temperature. Comput. Electron. Agric.117, 214–225 (2015). ArticleGoogle Scholar
  66. Patil, A. P. & Deka, P. C. An extreme learning machine approach for modeling evapotranspiration using extrinsic inputs. Comput. Electron. Agric.121, 385–392 (2016). ArticleGoogle Scholar
  67. Salcedo-Sanz, S., Deo, R. C., Carro-Calvo, L. & Saavedra-Moreno, B. Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms. Theor. Appl. Climatol.125, 13–25 (2016). ArticleGoogle Scholar
  68. Andersen, H., Cermak, J., Fuchs, J., Knutti, R. & Lohmann, U. Understanding the drivers of marine liquid-water cloud occurrence and properties with global observations using neural networks. Atmospheric Chem. Phys.17, 9535–9546 (2017). ArticleCASGoogle Scholar
  69. Das, S., Chakraborty, R. & Maitra, A. A random forest algorithm for nowcasting of intense precipitation events. Adv. Space Res.60, 1271–1282 (2017). ArticleCASGoogle Scholar
  70. Dayal, K., Deo, R. & Apan, A. A. In Climate Change Adaptation in Pacific Countries: Fostering Resilience and Improving the Quality of Life (ed. Leal Filho, W.) 177–198 (Springer International Publishing, Cham, 2017).
  71. Eghdamirad, S., Johnson, F. & Sharma, A. Using second-order approximation to incorporate GCM uncertainty in climate change impact assessments. Clim. Change142, 37–52 (2017). ArticleGoogle Scholar
  72. Majdzadeh Moghadam, F. Neural network-based approach for identification of meteorological factors affecting regional sea-level anomalies. J. Hydrol. Eng.22, 04016058-1–15 (2017). ArticleGoogle Scholar
  73. Kashiwao, T. et al. A neural network-based local rainfall prediction system using meteorological data on the internet: A case study using data from the Japan Meteorological Agency. Appl. Soft Comput.56, 317–330 (2017). ArticleGoogle Scholar
  74. Park, S., Im, J., Park, S. & Rhee, J. Drought monitoring using high resolution soil moisture through multi-sensor satellite data fusion over the Korean peninsula. Agric. For. Meteorol.237–238, 257–269 (2017). ArticleGoogle Scholar
  75. Rahmati, O. & Pourghasemi, H. R. Identification of critical flood prone areas in data-scarce and ungauged regions: A comparison of three data mining models. Water Resour. Manage.31, 1473–1487 (2017). ArticleGoogle Scholar
  76. Roodposhti, M. S., Safarrad, T. & Shahabi, H. Drought sensitivity mapping using two one-class support vector machine algorithms. Atmospheric Res.193, 73–82 (2017). ArticleGoogle Scholar
  77. Wu, J. et al. Establishing and assessing the Integrated Surface Drought Index (ISDI) for agricultural drought monitoring in mid-eastern China. Int. J. Appl. Earth Obs. Geoinformation23, 397–410 (2013). ArticleGoogle Scholar
  78. Zhou, L. et al. Quantitative and detailed spatiotemporal patterns of drought in China during 2001–2013. Sci. Total Environ.589, 136–145 (2017). ArticleCASGoogle Scholar
  79. Jones, G. D. et al. Selenium deficiency risk predicted to increase under future climate change. Proc. Natl Acad. Sci. USA114, 2848–2853 (2017). ArticleCASGoogle Scholar
  80. Tkachenko, N., Jarvis, S. & Procter, R. Predicting floods with Flickr tags. PLOS ONE12, e0172870 (2017). ArticleGoogle Scholar
  81. Preis, T., Moat, H. S., Bishop, S. R., Treleaven, P. & Stanley, H. E. Quantifying the digital traces of Hurricane Sandy on Flickr. Sci. Rep.3, 3141 (2013). ArticleGoogle Scholar

Acknowledgements

We thank C. Beisbart, A. Merrifield, S. Sippel, R. McMahon and J. Lilliestam for discussions and comments that have improved the quality of this manuscript. The research was supported by the Swiss National Science Foundation, National Research Programme 75 Big Data, project no. 167215.

Author information

Authors and Affiliations

  1. Institute for Environmental Decisions, ETH Zurich, Switzerland Benedikt Knüsel, Marius Zumwald, Christoph Baumberger, Gertrude Hirsch Hadorn & David N. Bresch
  2. Institute for Atmospheric and Climate Science, ETH Zurich, Switzerland Benedikt Knüsel, Marius Zumwald, Erich M. Fischer & Reto Knutti
  3. Federal Office of Meteorology and Climatology MeteoSwiss, Zurich, Switzerland David N. Bresch
  1. Benedikt Knüsel