Список литературы

2518-1092

Research result. Information technologies

2518-1092

10.18413/2518-1092-2026-11-1-0-4

4098

ARTIFICIAL INTELLIGENCE AND DECISION MAKING

<strong>ANALYSIS OF PROSODIC PARAMETERS OF EMOTIONALLY COLORED SPEECH</strong>

Balabanova

Tatiana Nikolaevna

Balabanova

Tatiana Nikolaevna

sozonova@bsuedu.ru

Belov

Alexander Sergeevich

Belov

Alexander Sergeevich

belov_as@bsu.edu.ru

Pashkov

Alexander Sergeevich

Pashkov

Alexander Sergeevich

Pogosad@yandex.ru

Mamatov

Evgeny Mikhailovich

Mamatov

Evgeny Mikhailovich

mamatov@bsuedu.ru

2026

11100

This paper presents a study of prosodic parameters of emotionally colored speech in the Russian language. The aim of the study is to identify the most informative acoustic features that allow distinguishing the emotional state of a speaker. The experimental data consisted of audio recordings from the Dusha emotional speech dataset, including four emotional states: anger, joy, sadness, and neutral speech. In total, 240 audio recordings of both male and female speakers were analyzed. The study focused on extracting and analyzing prosodic characteristics of speech signals, including pitch-related, energy, temporal, and phonation features. A combination of statistical analysis and machine learning methods was applied, including correlation analysis, feature importance estimation using the Random Forest algorithm, and Principal Component Analysis (PCA). The experimental results demonstrate that energy and pitch-related characteristics of speech are the most informative features for emotion recognition. In particular, mean signal energy, variability of the fundamental frequency, speech rate, and mean F0 showed the highest contribution to emotion classification. The analysis allowed identifying a compact feature space and revealing characteristic acoustic profiles for different emotional states. The obtained results can be used in the development of automatic speech emotion recognition systems and intelligent speech-based human–computer interaction technologies.

prosodic parametersemotional speechspeech emotion recognitionspeech signal analysisfundamental frequencymachine learningRandom Forestprincipal component analysisacoustic features

Список литературы

1.     Scherer K.R. Vocal communication of emotion: A review of research paradigms // Speech Communication. – 2003. – Vol. 40, № 1–2. – P. 227-256.

2.     Scherer K.R., Wallbott H.G. Evidence for universality and cultural variation of emotional expression in voice // Journal of Cross-Cultural Psychology. – 1994. – Vol. 25, № 1. – P. 92-110.

3.     Bänziger T., Scherer K.R. The role of intonation in emotional expressions // Speech Communication. – 2005. – Vol. 46. – P. 252-267.

4.     Schuller B., Batliner A. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. – Chichester: Wiley, 2014. – 324 p.

5.     Schröder M. Emotional speech synthesis: A review // Proceedings of the European Conference on Speech Communication and Technology. – Geneva, 2003. – P. 561-564.

6.     Busso C., Bulut M., Narayanan S. Toward effective automatic recognition systems of emotion in speech // IEEE Transactions on Audio, Speech, and Language Processing. – 2009. – Vol. 17, № 5. – P. 846-859.

7.     Narayanan S., Busso C. Analysis of emotional speech: A review // IEEE Signal Processing Magazine. – 2011. – Vol. 28, № 5. – P. 98-112.

8.     Ekman P. An argument for basic emotions // Cognition and Emotion. – 1992. – Vol. 6, № 3–4. –

P. 169-200.

9.     Cowie R., Douglas-Cowie E. Emotion recognition in human-computer interaction // IEEE Signal Processing Magazine. – 2001. – Vol. 18, № 1. – P. 32-80.

10.   Ververidis D., Kotropoulos C. Emotional speech recognition: Resources, features and methods // Speech Communication. – 2006. – Vol. 48, № 9. – P. 1162-1181.

11.   Rabiner L., Juang B.-H. Fundamentals of Speech Recognition. – New Jersey: Prentice Hall, 1993. – 507 p.

12.   Murray I., Arnott J. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion // Journal of the Acoustical Society of America. – 1993. – Vol. 93, № 2. – P. 1097–1108.

13.   Banse R., Scherer K.R. Acoustic profiles in vocal emotion expression // Journal of Personality and Social Psychology. – 1996. – Vol. 70, № 3. – P. 614-636.

14.   El Ayadi M., Kamel M., Karray F. Survey on speech emotion recognition: Features, classification schemes and databases // Pattern Recognition. – 2011. – Vol. 44, № 3. – P. 572-587.

15.   Schuller B. Speech emotion recognition: Two decades in a nutshell, benchmarks and ongoing trends // Communications of the ACM. – 2018. – Vol. 61, № 5. – P. 90-99.

16.   Latif S., Rana R., Qadir J., Epps J. Speech emotion recognition: State-of-the-art review // IEEE Access. – 2021. – Vol. 9. – P. 114509-114539.

17.   Neumann M., Vu N.T. Improving speech emotion recognition with unsupervised representation learning on unlabeled speech // IEEE/ACM Transactions on Audio, Speech, and Language Processing. – 2021. – Vol. 29. – P. 2388-2399.

18.   Pepino L., Riera P., Ferrer L. Emotion recognition from speech using wav2vec 2.0 embeddings // Proceedings of the Interspeech Conference. – 2021. – P. 3400-3404.

19.   Wagner J., Triantafyllopoulos A., Schuller B. Deep learning in paralinguistics: Recent trends and perspectives // IEEE Signal Processing Magazine. – 2023. – Vol. 40, № 3. – P. 104-118.

20.   Zhang Z., Deng J., Schuller B. Advances in speech emotion recognition: A survey // IEEE Transactions on Affective Computing. – 2024. – Vol. 15, № 1. – P. 123-139.