Comparison and evaluation of different clustering algorithms and methods to define weather event profiles and their characteristics

Authors

  • Julian Erath DHBW Stuttgart

Keywords:

Cluster analysis, Cluster models, Weather data, Meteorology, Clustering of weather data, Weather event profiles

Abstract

Cluster analyses with the algorithms KMeans, HAC, GMM & DBSCAN on weather data from Ontario, Canada, using DSR. The goal is to identify weather-event-profiles. The profiles could be used in weather forecasts, dashboards and for anomaly detection. IBM provides seven years of historical weather data with potential for future research.

References

Abdi, H. / Williams, L.J. (2010): Principal component analysis, in: Wiley interdisciplinary re-views: com-putational statistics, 2(4), pp.433-459.

Ackerman, S. / Knox, J. (2011): Meteorology. Jones & Bartlett Publishers.

Akande, A. / Costa, A.C. / Mateu, J. / Henriques, R. (2017): Geospatial analysis of extreme weather events in Nigeria (1985–2015) using self-organizing maps, in: Advances in Meteorology, 2017.

Ban, Z. / Liu, J. / Cao, L. (2018): Superpixel segmen-tation using Gaussian mixture model, in: IEEE Transactions on Image Processing, 27(8), pp.4105-4117.

Bellman, R. (1957): Dynamic programming. Princeton University, NJ, Princeton University Press, New Jersey.

Chandola, V. / Banerjee, A. / Kumar, V. (2009): Anomaly detection: A survey, in: ACM com-puting surveys (CSUR). Jul 30;41(3):1-58.

Chu, X. / Ilyas, I.F. / Krishnan, S. / Wang, J. (2016): Data cleaning: Overview and emerging challenges, in: Proceedings of the 2016 international conference on management of data (pp. 2201-2206).

Cui, M. (2020): Introduction to the k-means clustering algorithm based on the elbow method. Accounting, Auditing and Finance, 1(1), pp.5-8.

Das, S. / Sun, X. (2014): Investigating the pattern of traffic crashes under rainy weather by association rules in data mining, in: Transportation Research Board 93rd Annual Meeting (No. 14-1540). Trans-portation Research Board Washington DC.

de Lima, Glauston, R.T. / Stephan, S. (2013): A new classification approach for detecting severe weather patterns, in: Computers & geosciences 57 (2013): 158-165.

ECMWF (2023a): ERA5: data documentation. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation, Abruf: 01.03.2023, 13:36 Uhr

ECMWF (2023b): ERA5: reanalysis datasets for fore-casts. URL: https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5, Abruf: 01.03.2023, 13:44 Uhr

ECMWF (2023c): ERA5: data documentation parame-terlistings. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Parameterlistings, Abruf: 01.03.2023, 13:59 Uhr

Epstein, E.S. (1969): A scoring system for probability forecasts of ranked categories, in: Journal of Applied Meteorology (1962-1982), 8(6), pp.985-987.

Eskandarpour, R. / Khodaei, A. (2016): Machine learn-ing based power grid outage predic-tion in response to extreme events, in: IEEE Transactions on Power Systems, 32(4), pp.3315-3316.

Fathi, M. / Haghi Kashani, M. / Jameii, S. M. / Mah-dipour, E. (2022): Big Data Analytics in Weather Forecasting: A Systematic Review, in: Archives of Computational Methods in Engineering 29.2 (2022, Springer): 1247–1275

Ferstl, F. / Kanzler, M. / Rautenhaus, M. / Wester-mann, R. (2017): Time-hierarchical clus-tering and visualization of weather forecast ensembles, in: IEEE transactions on vis-ualization and computer graphics, 23(1), pp.831-840.

Firdaus, S. / Uddin, M.A. (2015): A survey on cluster-ing algorithms and complexity analysis, in: Interna-tional Journal of Computer Science Issues (IJCSI), 12(2), p.62.

Géron, A. (2019): Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Me-dia, Inc.

Ghirardelli, J.E. (2005): An Overview of the Redevel-oped Localized Aviation Mos Program (Lamp) For Short-Range Forecasting.

Giordani, P. / Ferraro, M.B. / Martella, F. (2020): Introduction to Clustering. Springer Singa-pore.

Grabbe, S.R. / Sridhar, B. / Mukherjee, A. (2014): Clustering days with similar airport weather condi-tions, in: 14th AIAA Aviation Technology, Integra-tion, and Operations Conference (p. 2712).

Gregor, S. / Hevner, A.R. (2013): Positioning and Presenting Design Science Research for Maximum Impact, in: MIS Quarterly, Jg. 37, Nr. 2, S. 337-355

Government of Canada (2022): Canada’s top 10 weather stories of 2022. URL: https://www.canada.ca/en/environment-climate-change/services/top-ten-weather-stories/2022.html, Abruf: 20.04.2023, 14:05 Uhr

Hasan, N. / Uddin, M.T. / Chowdhury, N.K. (2016): Automated weather event analysis with machine learning, in: International Conference on Innovations in Science 2016, Engi-neering and Technology (ICISET) (pp. 1-5). IEEE.

Hegland, M. (2003): Algorithms for Association Rules, in: Mendelson, S., Smola, A.J. (eds) Advanced Lec-tures on Machine Learning. Lecture Notes in Com-puter Science(), vol 2600. Springer, Berlin, Heidel-berg.

Hersbach, H. / Bell, B. / Berrisford, P. / Hirahara, S. / Horányi, A. / Muñoz‐Sabater, J. / Nicolas, J. / Peu-bey, C. / Radu, R. / Schepers, D. / Simmons, A. (2020): The ERA5 global reanalysis, in: Quarterly Journal of the Royal Meteorological Socie-ty, 146(730), pp.1999-2049.

Hevner, A. / Chatterjee, S. (2010): Design Research in Information Systems, Theory and Practice. Hrsg. von R. Sharda/S. Voß. Bd. 22. Integrated Series in Information Sys-tems. New York, NY, USA: Springer New York, NY.

Hevner, A. / March, S.T. / Park, J. / Ram, S. (2004): Design Science in Information Systems Research, in: MIS Quaterly 28.1, S. 75–105.

Hjelmfelt, M.R. (1990): Numerical study of the influ-ence of environmental conditions on lake-effect snowstorms over Lake Michigan, in: Monthly Weather Review, 118(1), pp.138-150.

Holmstrom, M. / Liu, D. / Vo, C. (2016): Machine learning applied to weather forecasting. Meteorolo-gy. Appl. Dec; 10: 1-5.

Horenko, I. / Dolaptchiev, S.I. / Eliseev, A.V. / Mokhov, I.I. / Klein, R. (2008): Metastable decom-position of high-dimensional meteorological data with gaps, in: Journal of the atmospheric sciences, 65(11), pp.3479-3496.

Hupfer, P. / Kuttler, W. (2005): Witterung und Klima. Eine Einführung in die Meteorologie und Klimato-logie, 11. Auflage

Jahn, M. (2015): Economics of extreme weather events: Terminology and regional impact models. Weather and Climate Extremes, 10, pp.29-39.

Jo, J.M. (2019): Effectiveness of normalization pre-processing of big data to the machine learning per-formance, in: The Journal of the Korea institute of electronic communica-tion sciences, 14(3), pp.547-552.

Kassambara, A. (2017): Practical guide to cluster analy-sis in R: Unsupervised machine learning, 1. Auflage, Sthda.

Kotsiantis, S. / Kanellopoulos, D. (2006): Association rules mining: A recent overview, in: GESTS Interna-tional Transactions on Computer Science and Engi-neering. 2006 Jan;32(1): 71-82.

Liljequist, G.H. / Cehak, K. (1984): Allgemeine Meteo-rologie. 3. Auflage, Springer-Verlag.

Liu, F. / Deng, Y. (2020): Determine the number of unknown targets in open world based on elbow method, in: IEEE Transactions on Fuzzy Systems, 29(5), pp.986-995.

Liu, F. / Ting, K.M. / Zhou, Z.H. (2012): Isolation-based anomaly detection, in: ACM Trans Knowl. Discov. Data 6(1): Article 3

Mitchell, T. (1997): Machine learning. McGraw Hill, New York

Moon, T.K. (1996): The expectation-maximization algorithm, in: IEEE Signal processing mag-azine, 13(6), pp.47-60.

Pelosi, A. / Terribile, F. / D’Urso, G. / Chirico, G.B. (2020): Comparison of ERA5-Land and UERRA MESCAN-SURFEX reanalysis data with spatially interpolated weather ob-servations for the regional assessment of reference evapotranspiration. Water, 12(6), p.1669.

Pooja, S.B. / Balan, R.S. / Anisha, M. / Muthukuma-ran, M.S. / Jothikumar, R. (2020): Techniques Tan-imoto correlated feature selection system and hybrid-ization of clus-tering and boosting ensemble classifi-cation of remote sensed big data for weather forecast-ing. Computer Communications, 151, pp.266-274.

Poteraş, C.M. / Mihăescu, M.C. / Mocanu, M. (2014): An optimized version of the K-Means clustering al-gorithm, in 2014 Federated Conference on Computer Science and Infor-mation Systems (pp. 695-699). IEEE.

Ray, P. (ed) (2015): Mesoscale meteorology and fore-casting. Springer.

Runkler, T.A. (1999): Probabilistische und Fuzzy Methoden für die Clusteranalyse, in: Seis-ing, R. (eds) Fuzzy Theorie und Stochastik. Computational Intelligence. Vie-weg+Teubner Verlag, Wiesbaden.

Scikit Learn (2023a): Clustering. URL: https://scikit-learn.org/stable/modules/clustering.html, Abruf: 07.03.2023, 14:33 Uhr

Scikit Learn (2023b): Preprocessing. URL: https://scikit-learn.org/stable/modules/preprocessing.html, Abruf: 19.04.2022, 16:37 Uhr

Sagiroglu, S. / Sinanc, D. (2013): Big data: A review, in: International conference on collabo-ration tech-nologies and systems (CTS) 2013 May 20 (pp. 42-47). IEEE.

Savaresi, S.M. / Boley, D.L. / Bittanti, S. / Gazzaniga, G. (2002): Cluster selection in divi-sive clustering algorithms, in: Proceedings of the 2002 SIAM In-ternational Confer-ence on Data Mining (pp. 299-314). Society for Industrial and Applied Mathema-tics.

Spektrum Akademischer Verlag, Heidelberg, (2000): Lexikon Der Geowissenschaften: Atmosphäre. URL: https://www.spektrum.de/lexikon/geowissenschaften/atmosphaere/1060, Abruf: 23.02.2023, 13:46 Uhr

Syakur, M. A. / Khotimah, B. K. / Rochman, E. M. S. / Satoto, B. D. (2018): Integration k-means cluster-ing method and elbow method for identification of the best customer profile cluster, in: IOP conference series: materials science and engineering (Vol. 336, p. 012017). IOP Publishing.

The Weather Network (2022): The Weather Network. URL: https://www.theweathernetwork.com/en/news/weather/, Abruf: 24.04.2023, 15:06 Uhr

Thudumu, S. / Branch, P. / Jin, J. / Singh, J. (2020): A comprehensive survey of anomaly detection tech-niques for high dimensional big data, in: Journal of Big Data. Dec;7: 1-30.

Fang, W. / Sheng, V.S. / Wen, X. / Pan, W. (2014): Meteorological data analysis using mapreduce, in: The Scientific World Journal, 2014.

Webster, J. / Watson, R.T. (2002): Analyzing the past to prepare for the future: Writing a literature review, in: MIS quarterly. Jun 1: xiii-xiii.

Xu, Q. / He, D. / Zhang, N. / Kang, C. / Xia, Q. / Bai, J. / Huang, J. (2015): A short-term wind power fore-casting approach with adjustment of numerical weather prediction in-put by data mining, in: IEEE Transactions on sustainable energy, 6(4), pp.1283-1291.

Yuan, C. / Yang, H. (2019): Research on K-value selec-tion method of K-means clustering algorithm. J, 2(2), pp.226-235.

Zhou, Z.H. (2021): Machine learning. Springer Nature.

Downloads

Published

2023-12-27

Issue

Section

Theses