Application of Artificial Intelligence to improve Customer Understanding: Transformer based Topic Modeling in practice
Since the past few years, so-called pre-trained Language Models (PLM) are considered state-of-the-art in the field of Natural Language Processing (NLP) and are thus experiencing widespread and successful application. In addition to traditional supervised Machine Learning (ML) tasks such as spam email or customer churn classification, this technology opens up advanced approaches to unsupervised learning and data analytics in general. One of particular interest is the automatic identification of latent topics within a large collection of texts, also known as Topic Modeling (TM). Such modelling approaches offer great potential, especially for industrial environments as well as the consumer goods market, to explore increasing amounts of data from diverse and constantly growing data sources. As a holistic concept, it can be utilized in a highly targeted and efficient manner for applications through an optimized combination of Artificial Intelligence (AI) and Cloud Computing (CC) systems.
Aggarwal, C. C. (2014). An introduction to cluster analysis. In C. C. Aggarwal & C. K. Reddy (Eds.), Data clustering. CRC Press.
Angelov, D. (2020). Top2vec: Distributed representations of topics. https://doi.org/10.48550/ARXIV.2008.09470
Blei, D. M. (2012). Probabilistic topic models. Communications of ACM, (4), 77–84.
Campello, R. J. G. B., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In D. Hutchison, T. Kanade, & J. Kittler (Eds.), Advances in knowledge discovery and data mining (pp. 160–172). Springer.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). Crisp-dm 1.0: Step-by-step data mining guide.
Coenen, A., & Pearce, A. (2019). Understanding umap (Google Pair, Ed.). Retrieved December 13, 2021, from https://pair-code. github.io/understanding-umap/
DeepAI. (2021). Distributed representations. Retrieved December 1, 2021, from https://deepai .org/machine - learning- glossaryand-terms/distributed-representation
Delen, D. (2020). Predictive analytics: Data mining, machine learning and data science for practitioners: Data mining, machine learning and data science for practitioners (2nd ed.). Pearson FT Press.
Foster, D. (2019). Generative deep learning: Teaching machines to paint, write, compose, and play (1st ed.). O’Reilly.
Geron, A. (2019). ´ Hands-on machine learning with scikit-learn, keras, and tensorflow: Concepts, tools, and techniques to build intelligent systems (2nd ed.). O’Reilly.
Gibbons, M., Limoges, C., Nowotny, H., Schwartzmann, S., Scott, P., & Trow, M. (2008). The new production of knowledge: The dynamics of science and research in contemporary societies. SAGE Publications.
Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. https://doi.org/10.48550/ARXIV. 2203.05794
Isson, J.-P. (2018). Unstructured data analytics: How to improve customer acquisition, customer retention, and fraud detection and prevention. Wiley. https : / / doi . org / 10.1002/9781119378846
Liu, Z., Lin, Y., & Sun, M. (2020). Representation learning for natural language processing (1st ed.). Springer.
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. https://doi.org/10.48550/ARXIV.1802. 03426
Microsoft. (2022). Windows virtual machines pricing. Retrieved June 9, 2022, from https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/#pricing
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. https://doi.org/10.48550/ARXIV.1301.3781
Patel, A. A. (2019). Hands-on unsupervised learning using python: How to build applied machine learning solutions from unlabeled data (1st ed.). O’Reilly.
Phi, M. (2020). Illustrated guide to transformers: Step by step explanation. Retrieved June 17, 2022, from https ://towardsdatascience . com/illustrated-guide-to-transformers-step-by-step-explanation-f74876522bc0
Potnis, A. (2018). Illuminating insight for unstructured data at scale. Retrieved December 4, 2021, from https ://www.ibm.com/downloads/cas/Z2ZBAY6R
Rapids Development Team. (2018). Rapids: Collection of libraries for end to end gpu data
science. Retrieved July 26, 2022, from https://rapids.ai
Sarkar, D., Bali, R., & Sharma, T. (2018). Practical machine learning with python: A problem-solver’s guide to building realworld intelligent systems. Apress.
TensorFlow. (2021a). Transformer model for language understanding. Retrieved November 18, 2021, from https ://www.tensorflow.org/text/tutorials/transformer
Vajjala, S., Majumder, B., Gupta, A., & Surana, H. (2020). Practical natural language processing: A comprehensive guide to building real-world nlp systems (1st ed.). O’Reilly.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., &
Polosukhin, I. (2017). Attention is all you need. https://doi.org/10.48550/ARXIV.1706.03762
Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: Principles and
techniques for data scientists (1st ed.). O’Reilly.
Copyright (c) 2022 Prof. Dr. Frank Morelli, Nils Blessing
Dieses Werk steht unter der Lizenz Creative Commons Namensnennung 4.0 International.