Comparison and evaluation of different clustering algorithms and methods to define weather event profiles and their characteristics
DOI:
https://doi.org/10.26034/lu.akwi.2023.4557Keywords:
Cluster analysis, Cluster models, Weather data, Meteorology, Clustering of weather data, Weather event profilesAbstract
Cluster analyses with the algorithms KMeans, HAC, GMM & DBSCAN on weather data from Ontario, Canada, using DSR. The goal is to identify weather-event-profiles. The profiles could be used in weather forecasts, dashboards and for anomaly detection. IBM provides seven years of historical weather data with potential for future research.
References
Abdi, H. / Williams, L.J. (2010): Principal component analysis, in: Wiley interdisciplinary re-views: com-putational statistics, 2(4), pp.433-459.
Ackerman, S. / Knox, J. (2011): Meteorology. Jones & Bartlett Publishers.
Akande, A. / Costa, A.C. / Mateu, J. / Henriques, R. (2017): Geospatial analysis of extreme weather events in Nigeria (1985–2015) using self-organizing maps, in: Advances in Meteorology, 2017.
Ban, Z. / Liu, J. / Cao, L. (2018): Superpixel segmen-tation using Gaussian mixture model, in: IEEE Transactions on Image Processing, 27(8), pp.4105-4117.
Bellman, R. (1957): Dynamic programming. Princeton University, NJ, Princeton University Press, New Jersey.
Chandola, V. / Banerjee, A. / Kumar, V. (2009): Anomaly detection: A survey, in: ACM com-puting surveys (CSUR). Jul 30;41(3):1-58.
Chu, X. / Ilyas, I.F. / Krishnan, S. / Wang, J. (2016): Data cleaning: Overview and emerging challenges, in: Proceedings of the 2016 international conference on management of data (pp. 2201-2206).
Cui, M. (2020): Introduction to the k-means clustering algorithm based on the elbow method. Accounting, Auditing and Finance, 1(1), pp.5-8.
Das, S. / Sun, X. (2014): Investigating the pattern of traffic crashes under rainy weather by association rules in data mining, in: Transportation Research Board 93rd Annual Meeting (No. 14-1540). Trans-portation Research Board Washington DC.
de Lima, Glauston, R.T. / Stephan, S. (2013): A new classification approach for detecting severe weather patterns, in: Computers & geosciences 57 (2013): 158-165.
ECMWF (2023a): ERA5: data documentation. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation, Abruf: 01.03.2023, 13:36 Uhr
ECMWF (2023b): ERA5: reanalysis datasets for fore-casts. URL: https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5, Abruf: 01.03.2023, 13:44 Uhr
ECMWF (2023c): ERA5: data documentation parame-terlistings. URL: https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation#ERA5:datadocumentation-Parameterlistings, Abruf: 01.03.2023, 13:59 Uhr
Epstein, E.S. (1969): A scoring system for probability forecasts of ranked categories, in: Journal of Applied Meteorology (1962-1982), 8(6), pp.985-987.
Eskandarpour, R. / Khodaei, A. (2016): Machine learn-ing based power grid outage predic-tion in response to extreme events, in: IEEE Transactions on Power Systems, 32(4), pp.3315-3316.
Fathi, M. / Haghi Kashani, M. / Jameii, S. M. / Mah-dipour, E. (2022): Big Data Analytics in Weather Forecasting: A Systematic Review, in: Archives of Computational Methods in Engineering 29.2 (2022, Springer): 1247–1275
Ferstl, F. / Kanzler, M. / Rautenhaus, M. / Wester-mann, R. (2017): Time-hierarchical clus-tering and visualization of weather forecast ensembles, in: IEEE transactions on vis-ualization and computer graphics, 23(1), pp.831-840.
Firdaus, S. / Uddin, M.A. (2015): A survey on cluster-ing algorithms and complexity analysis, in: Interna-tional Journal of Computer Science Issues (IJCSI), 12(2), p.62.
Géron, A. (2019): Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Me-dia, Inc.
Ghirardelli, J.E. (2005): An Overview of the Redevel-oped Localized Aviation Mos Program (Lamp) For Short-Range Forecasting.
Giordani, P. / Ferraro, M.B. / Martella, F. (2020): Introduction to Clustering. Springer Singa-pore.
Grabbe, S.R. / Sridhar, B. / Mukherjee, A. (2014): Clustering days with similar airport weather condi-tions, in: 14th AIAA Aviation Technology, Integra-tion, and Operations Conference (p. 2712).
Gregor, S. / Hevner, A.R. (2013): Positioning and Presenting Design Science Research for Maximum Impact, in: MIS Quarterly, Jg. 37, Nr. 2, S. 337-355
Government of Canada (2022): Canada’s top 10 weather stories of 2022. URL: https://www.canada.ca/en/environment-climate-change/services/top-ten-weather-stories/2022.html, Abruf: 20.04.2023, 14:05 Uhr
Hasan, N. / Uddin, M.T. / Chowdhury, N.K. (2016): Automated weather event analysis with machine learning, in: International Conference on Innovations in Science 2016, Engi-neering and Technology (ICISET) (pp. 1-5). IEEE.
Hegland, M. (2003): Algorithms for Association Rules, in: Mendelson, S., Smola, A.J. (eds) Advanced Lec-tures on Machine Learning. Lecture Notes in Com-puter Science(), vol 2600. Springer, Berlin, Heidel-berg.
Hersbach, H. / Bell, B. / Berrisford, P. / Hirahara, S. / Horányi, A. / Muñoz‐Sabater, J. / Nicolas, J. / Peu-bey, C. / Radu, R. / Schepers, D. / Simmons, A. (2020): The ERA5 global reanalysis, in: Quarterly Journal of the Royal Meteorological Socie-ty, 146(730), pp.1999-2049.
Hevner, A. / Chatterjee, S. (2010): Design Research in Information Systems, Theory and Practice. Hrsg. von R. Sharda/S. Voß. Bd. 22. Integrated Series in Information Sys-tems. New York, NY, USA: Springer New York, NY.
Hevner, A. / March, S.T. / Park, J. / Ram, S. (2004): Design Science in Information Systems Research, in: MIS Quaterly 28.1, S. 75–105.
Hjelmfelt, M.R. (1990): Numerical study of the influ-ence of environmental conditions on lake-effect snowstorms over Lake Michigan, in: Monthly Weather Review, 118(1), pp.138-150.
Holmstrom, M. / Liu, D. / Vo, C. (2016): Machine learning applied to weather forecasting. Meteorolo-gy. Appl. Dec; 10: 1-5.
Horenko, I. / Dolaptchiev, S.I. / Eliseev, A.V. / Mokhov, I.I. / Klein, R. (2008): Metastable decom-position of high-dimensional meteorological data with gaps, in: Journal of the atmospheric sciences, 65(11), pp.3479-3496.
Hupfer, P. / Kuttler, W. (2005): Witterung und Klima. Eine Einführung in die Meteorologie und Klimato-logie, 11. Auflage
Jahn, M. (2015): Economics of extreme weather events: Terminology and regional impact models. Weather and Climate Extremes, 10, pp.29-39.
Jo, J.M. (2019): Effectiveness of normalization pre-processing of big data to the machine learning per-formance, in: The Journal of the Korea institute of electronic communica-tion sciences, 14(3), pp.547-552.
Kassambara, A. (2017): Practical guide to cluster analy-sis in R: Unsupervised machine learning, 1. Auflage, Sthda.
Kotsiantis, S. / Kanellopoulos, D. (2006): Association rules mining: A recent overview, in: GESTS Interna-tional Transactions on Computer Science and Engi-neering. 2006 Jan;32(1): 71-82.
Liljequist, G.H. / Cehak, K. (1984): Allgemeine Meteo-rologie. 3. Auflage, Springer-Verlag.
Liu, F. / Deng, Y. (2020): Determine the number of unknown targets in open world based on elbow method, in: IEEE Transactions on Fuzzy Systems, 29(5), pp.986-995.
Liu, F. / Ting, K.M. / Zhou, Z.H. (2012): Isolation-based anomaly detection, in: ACM Trans Knowl. Discov. Data 6(1): Article 3
Mitchell, T. (1997): Machine learning. McGraw Hill, New York
Moon, T.K. (1996): The expectation-maximization algorithm, in: IEEE Signal processing mag-azine, 13(6), pp.47-60.
Pelosi, A. / Terribile, F. / D’Urso, G. / Chirico, G.B. (2020): Comparison of ERA5-Land and UERRA MESCAN-SURFEX reanalysis data with spatially interpolated weather ob-servations for the regional assessment of reference evapotranspiration. Water, 12(6), p.1669.
Pooja, S.B. / Balan, R.S. / Anisha, M. / Muthukuma-ran, M.S. / Jothikumar, R. (2020): Techniques Tan-imoto correlated feature selection system and hybrid-ization of clus-tering and boosting ensemble classifi-cation of remote sensed big data for weather forecast-ing. Computer Communications, 151, pp.266-274.
Poteraş, C.M. / Mihăescu, M.C. / Mocanu, M. (2014): An optimized version of the K-Means clustering al-gorithm, in 2014 Federated Conference on Computer Science and Infor-mation Systems (pp. 695-699). IEEE.
Ray, P. (ed) (2015): Mesoscale meteorology and fore-casting. Springer.
Runkler, T.A. (1999): Probabilistische und Fuzzy Methoden für die Clusteranalyse, in: Seis-ing, R. (eds) Fuzzy Theorie und Stochastik. Computational Intelligence. Vie-weg+Teubner Verlag, Wiesbaden.
Scikit Learn (2023a): Clustering. URL: https://scikit-learn.org/stable/modules/clustering.html, Abruf: 07.03.2023, 14:33 Uhr
Scikit Learn (2023b): Preprocessing. URL: https://scikit-learn.org/stable/modules/preprocessing.html, Abruf: 19.04.2022, 16:37 Uhr
Sagiroglu, S. / Sinanc, D. (2013): Big data: A review, in: International conference on collabo-ration tech-nologies and systems (CTS) 2013 May 20 (pp. 42-47). IEEE.
Savaresi, S.M. / Boley, D.L. / Bittanti, S. / Gazzaniga, G. (2002): Cluster selection in divi-sive clustering algorithms, in: Proceedings of the 2002 SIAM In-ternational Confer-ence on Data Mining (pp. 299-314). Society for Industrial and Applied Mathema-tics.
Spektrum Akademischer Verlag, Heidelberg, (2000): Lexikon Der Geowissenschaften: Atmosphäre. URL: https://www.spektrum.de/lexikon/geowissenschaften/atmosphaere/1060, Abruf: 23.02.2023, 13:46 Uhr
Syakur, M. A. / Khotimah, B. K. / Rochman, E. M. S. / Satoto, B. D. (2018): Integration k-means cluster-ing method and elbow method for identification of the best customer profile cluster, in: IOP conference series: materials science and engineering (Vol. 336, p. 012017). IOP Publishing.
The Weather Network (2022): The Weather Network. URL: https://www.theweathernetwork.com/en/news/weather/, Abruf: 24.04.2023, 15:06 Uhr
Thudumu, S. / Branch, P. / Jin, J. / Singh, J. (2020): A comprehensive survey of anomaly detection tech-niques for high dimensional big data, in: Journal of Big Data. Dec;7: 1-30.
Fang, W. / Sheng, V.S. / Wen, X. / Pan, W. (2014): Meteorological data analysis using mapreduce, in: The Scientific World Journal, 2014.
Webster, J. / Watson, R.T. (2002): Analyzing the past to prepare for the future: Writing a literature review, in: MIS quarterly. Jun 1: xiii-xiii.
Xu, Q. / He, D. / Zhang, N. / Kang, C. / Xia, Q. / Bai, J. / Huang, J. (2015): A short-term wind power fore-casting approach with adjustment of numerical weather prediction in-put by data mining, in: IEEE Transactions on sustainable energy, 6(4), pp.1283-1291.
Yuan, C. / Yang, H. (2019): Research on K-value selec-tion method of K-means clustering algorithm. J, 2(2), pp.226-235.
Zhou, Z.H. (2021): Machine learning. Springer Nature.
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Julian Erath
This work is licensed under a Creative Commons Attribution 4.0 International License.