Enhancing IVR Systems in Mobile Banking with Emotion Analysis for Adaptive Dialogue Flows and Seamless Transition to Human Assistance

Alper Ozpinar

Istanbul Commerce University

https://orcid.org/0000-0003-1250-5949

Ersin Alpan

Softtech

https://orcid.org/0009-0007-9798-1253

Taner Celik

Softtech

https://orcid.org/0009-0005-1978-0822

DOI: https://doi.org/10.56038/oprd.v3i1.382

Keywords: Emotion Analysis, CNN, NLP, Adaptive Dialogs, IVR Systems


Abstract

This study introduces an advanced approach to improving Interactive Voice Response (IVR) systems for mobile banking by integrating emotion analysis with a fusion of specialized datasets. Utilizing the RAVDESS, CREMA-D, TESS, and SAVEE datasets, this research exploits a diverse array of emotional speech and song samples to analyze customer sentiment in call center interactions. These datasets provide a multi-modal emotional context that significantly enriches the IVR experience.

The cornerstone of our methodology is the implementation of Mel-Frequency Cepstral Coefficients (MFCC) Extraction. The MFCCs, extracted from audio inputs, form a 2D array where time and cepstral coefficients create a structure that closely resembles an image. This format is particularly suitable for Convolutional Neural Networks (CNNs), which excel in interpreting such 'image-like' data for emotion recognition, hence enhancing the system's responsiveness to emotional cues.

Proposed system's architecture is adeptly designed to modify dialogue flows dynamically, informed by the emotional tone of customer interactions. This innovation not only improves customer engagement but also ensures a seamless handover to human operators when the situation calls for a personal touch, optimizing the balance between automated efficiency and human empathy.

The results of this research demonstrate the potential of emotion-aware IVR systems to anticipate and meet customer needs more effectively, paving the way for a new standard in user-centric banking services.


References

R. Alt, R. Beck, and M. T. Smits, “FinTech and the transformation of the financial industry,” Electronic markets, vol. 28. Springer, pp. 235–243, 2018. DOI: https://doi.org/10.1007/s12525-018-0310-9

P. Manatsa, “An analysis of the impact of implementing a new interactive voice response system (IVR) on client experience in the Canadian Banking Industry,” 2019.

R. A. Feinberg, L. Hokama, R. Kadam, and I. Kim, “Operational determinants of caller satisfaction in the banking/financial services call center,” International Journal of Bank Marketing, vol. 20, no. 4, pp. 174–180, 2002. DOI: https://doi.org/10.1108/02652320210432954

S. M. Yacoub, S. J. Simske, X. Lin, and J. Burns, “Recognition of emotions in interactive voice response systems.,” in Interspeech, 2003. DOI: https://doi.org/10.21437/Eurospeech.2003-307

L. E. Rocha, D. M. R. Glina, M. de Fatimá Marinho, and D. Nakasato, “Risk factors for musculoskeletal symptoms among call center operators of a bank in Sao Paulo, Brazil,” Ind Health, vol. 43, no. 4, pp. 637–646, 2005. DOI: https://doi.org/10.2486/indhealth.43.637

L. Muda, M. Begam, and I. Elamvazuthi, “Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques,” arXiv preprint arXiv:1003.4083, 2010.

B. Logan, “Mel frequency cepstral coefficients for music modeling.,” in Ismir, Plymouth, MA, 2000, p. 11.

S. A. Khayam, “The discrete cosine transform (DCT): theory and application,” Michigan State University, vol. 114, no. 1, p. 31, 2003.

M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for MFCC feature extraction,” in 2010 4th International Conference on Signal Processing and Communication Systems, IEEE, 2010, pp. 1–5. DOI: https://doi.org/10.1109/ICSPCS.2010.5709752

F. Jiang, H. Li, Z. Zhang, and X. Zhang, “An event recognition method for fiber distributed acoustic sensing systems based on the combination of MFCC and CNN,” in 2017 International Conference on Optical Instruments and Technology: Advanced Optical Sensors and Applications, SPIE, 2018, pp. 15–21.

K. Mridha, S. Sarkar, and D. Kumar, “Respiratory disease classification by CNN using MFCC,” in 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA), IEEE, 2021, pp. 517–523. DOI: https://doi.org/10.1109/ICCCA52192.2021.9666346

S. Jin, X. Wang, L. Du, and D. He, “Evaluation and modeling of automotive transmission whine noise quality based on MFCC and CNN,” Applied Acoustics, vol. 172, p. 107562, 2021. DOI: https://doi.org/10.1016/j.apacoust.2020.107562

A. Chowdhury and A. Ross, “Fusing MFCC and LPC features using 1D triplet CNN for speaker recognition in severely degraded audio signals,” IEEE transactions on information forensics and security, vol. 15, pp. 1616–1629, 2019. DOI: https://doi.org/10.1109/TIFS.2019.2941773

G. Petmezas et al., “Automated lung sound classification using a hybrid CNN-LSTM network and focal loss function,” Sensors, vol. 22, no. 3, p. 1232, 2022. DOI: https://doi.org/10.3390/s22031232

H.-A. Rashid, A. N. Mazumder, U. P. K. Niyogi, and T. Mohsenin, “CoughNet: A flexible low power CNN-LSTM processor for cough sound detection,” in 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE, 2021, pp. 1–4. DOI: https://doi.org/10.1109/AICAS51828.2021.9458509

E. Åberg and Y. Khati, “Artificial Intelligence in Customer Service: A Study on Customers’ Perceptions regarding IVR Services in the Banking Industry.” 2018.

G. D. P. Regulation, “General data protection regulation (GDPR),” Intersoft Consulting, Accessed in October, vol. 24, no. 1, 2018.

E. Arfelt, D. Basin, and S. Debois, “Monitoring the GDPR,” in Computer Security–ESORICS 2019: 24th European Symposium on Research in Computer Security, Luxembourg, September 23–27, 2019, Proceedings, Part I 24, Springer, 2019, pp. 681–699. DOI: https://doi.org/10.1007/978-3-030-29959-0_33

C. Tankard, “What the GDPR means for businesses,” Network Security, vol. 2016, no. 6, pp. 5–8, 2016. DOI: https://doi.org/10.1016/S1353-4858(16)30056-3

M. R. Ahmed, S. Islam, A. K. M. M. Islam, and S. Shatabda, “An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition,” Expert Syst Appl, vol. 218, p. 119633, 2023. DOI: https://doi.org/10.1016/j.eswa.2023.119633

S. Ullah, Q. A. Sahib, S. Ullahh, I. U. Haq, and I. Ullah, “Speech Emotion Recognition Using Deep Neural Networks,” in 2022 International Conference on IT and Industrial Technologies (ICIT), IEEE, 2022, pp. 1–6. DOI: https://doi.org/10.1109/ICIT56493.2022.9989197

M. Zielonka, A. Piastowski, A. Czyżewski, P. Nadachowski, M. Operlejn, and K. Kaczor, “Recognition of Emotions in Speech Using Convolutional Neural Networks on Different Datasets,” Electronics (Basel), vol. 11, no. 22, p. 3831, 2022. DOI: https://doi.org/10.3390/electronics11223831

H. Dolka, A. X. VM, and S. Juliet, “Speech emotion recognition using ANN on MFCC features,” in 2021 3rd international conference on signal processing and communication (ICPSC), IEEE, 2021, pp. 431–435. DOI: https://doi.org/10.1109/ICSPC51351.2021.9451810

N. Chitre, N. Bhorade, P. Topale, J. Ramteke, and C. R. Gajbhiye, “Speech Emotion Recognition to assist Autistic Children,” in 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), IEEE, 2022, pp. 983–990. DOI: https://doi.org/10.1109/ICAAIC53929.2022.9792663

S. R. Livingstone and F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English,” PLoS One, vol. 13, no. 5, p. e0196391, 2018. DOI: https://doi.org/10.1371/journal.pone.0196391

H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma, “Crema-d: Crowd-sourced emotional multimodal actors dataset,” IEEE Trans Affect Comput, vol. 5, no. 4, pp. 377–390, 2014. DOI: https://doi.org/10.1109/TAFFC.2014.2336244

K. Dupuis and M. K. Pichora-Fuller, “Toronto emotional speech set (tess)-younger talker_happy,” 2010.

P. Jackson and Sju. Haq, “Surrey audio-visual expressed emotion (savee) database,” University of Surrey: Guildford, UK, 2014.

Most read articles by the same author(s)