Список литературы

2518-1092

Научный результат. Информационные технологии

2518-1092

10.18413/2518-1092-2026-11-1-0-4

4098

ИСКУССТВЕННЫЙ ИНТЕЛЛЕКТ И ПРИНЯТИЕ РЕШЕНИЙ

<strong>АНАЛИЗ ПРОСОДИЧЕСКИХ ПАРАМЕТРОВ ЭМОЦИОНАЛЬНО ОКРАШЕННОЙ РЕЧИ</strong>

<strong>ANALYSIS OF PROSODIC PARAMETERS OF EMOTIONALLY COLORED SPEECH</strong>

Балабанова

Татьяна Николаевна

Balabanova

Tatiana Nikolaevna

sozonova@bsuedu.ru

Белов

Александр Сергеевич

Belov

Alexander Sergeevich

belov_as@bsu.edu.ru

Пашков

Александр Сергеевич

Pashkov

Alexander Sergeevich

Pogosad@yandex.ru

Маматов

Евгений Михайлович

Mamatov

Evgeny Mikhailovich

mamatov@bsuedu.ru

2026

11100

В работе представлено исследование просодических параметров эмоционально окрашенной речи на русском языке. Целью исследования является выявление наиболее информативных акустических признаков, позволяющих различать эмоциональные состояния говорящего. В качестве экспериментальных данных использовались аудиозаписи из корпуса эмоциональной речи Dusha, включающие четыре эмоциональных состояния: злость, радость, грусть и нейтральную речь. Всего было проанализировано 240 аудиофайлов, содержащих записи мужской и женской речи. В работе были извлечены и исследованы просодические характеристики речевого сигнала, включающие параметры высоты основного тона, энергетические, темпоральные и фонационные признаки. Для анализа данных применялся комплекс статистических методов и методов машинного обучения, включающий корреляционный анализ, оценку важности признаков с использованием алгоритма Random Forest, а также анализ главных компонент (Principal Component Analysis (PCA)). Результаты эксперимента показали, что наибольшую информативность для распознавания эмоций в речи имеют энергетические и интонационные характеристики сигнала, в частности средняя энергия речи, вариативность частоты основного тона, темп речи и среднее значение F0. Проведённый анализ позволил выделить компактное пространство признаков и выявить характерные акустические профили для различных эмоциональных состояний. Полученные результаты могут быть использованы при разработке систем автоматического распознавания эмоций в речевых сигналах и интеллектуальных речевых интерфейсов.

This paper presents a study of prosodic parameters of emotionally colored speech in the Russian language. The aim of the study is to identify the most informative acoustic features that allow distinguishing the emotional state of a speaker. The experimental data consisted of audio recordings from the Dusha emotional speech dataset, including four emotional states: anger, joy, sadness, and neutral speech. In total, 240 audio recordings of both male and female speakers were analyzed. The study focused on extracting and analyzing prosodic characteristics of speech signals, including pitch-related, energy, temporal, and phonation features. A combination of statistical analysis and machine learning methods was applied, including correlation analysis, feature importance estimation using the Random Forest algorithm, and Principal Component Analysis (PCA). The experimental results demonstrate that energy and pitch-related characteristics of speech are the most informative features for emotion recognition. In particular, mean signal energy, variability of the fundamental frequency, speech rate, and mean F0 showed the highest contribution to emotion classification. The analysis allowed identifying a compact feature space and revealing characteristic acoustic profiles for different emotional states. The obtained results can be used in the development of automatic speech emotion recognition systems and intelligent speech-based human–computer interaction technologies.

просодические параметры речиэмоциональная речьраспознавание эмоцийанализ речевых сигналовчастота основного тонамашинное обучениеRandom Forestанализ главных компонентакустические признаки

prosodic parametersemotional speechspeech emotion recognitionspeech signal analysisfundamental frequencymachine learningRandom Forestprincipal component analysisacoustic features

Список литературы

1.     Scherer K.R. Vocal communication of emotion: A review of research paradigms // Speech Communication. – 2003. – Vol. 40, № 1–2. – P. 227-256.

2.     Scherer K.R., Wallbott H.G. Evidence for universality and cultural variation of emotional expression in voice // Journal of Cross-Cultural Psychology. – 1994. – Vol. 25, № 1. – P. 92-110.

3.     Bänziger T., Scherer K.R. The role of intonation in emotional expressions // Speech Communication. – 2005. – Vol. 46. – P. 252-267.

4.     Schuller B., Batliner A. Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing. – Chichester: Wiley, 2014. – 324 p.

5.     Schröder M. Emotional speech synthesis: A review // Proceedings of the European Conference on Speech Communication and Technology. – Geneva, 2003. – P. 561-564.

6.     Busso C., Bulut M., Narayanan S. Toward effective automatic recognition systems of emotion in speech // IEEE Transactions on Audio, Speech, and Language Processing. – 2009. – Vol. 17, № 5. – P. 846-859.

7.     Narayanan S., Busso C. Analysis of emotional speech: A review // IEEE Signal Processing Magazine. – 2011. – Vol. 28, № 5. – P. 98-112.

8.     Ekman P. An argument for basic emotions // Cognition and Emotion. – 1992. – Vol. 6, № 3–4. –

P. 169-200.

9.     Cowie R., Douglas-Cowie E. Emotion recognition in human-computer interaction // IEEE Signal Processing Magazine. – 2001. – Vol. 18, № 1. – P. 32-80.

10.   Ververidis D., Kotropoulos C. Emotional speech recognition: Resources, features and methods // Speech Communication. – 2006. – Vol. 48, № 9. – P. 1162-1181.

11.   Rabiner L., Juang B.-H. Fundamentals of Speech Recognition. – New Jersey: Prentice Hall, 1993. – 507 p.

12.   Murray I., Arnott J. Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion // Journal of the Acoustical Society of America. – 1993. – Vol. 93, № 2. – P. 1097–1108.

13.   Banse R., Scherer K.R. Acoustic profiles in vocal emotion expression // Journal of Personality and Social Psychology. – 1996. – Vol. 70, № 3. – P. 614-636.

14.   El Ayadi M., Kamel M., Karray F. Survey on speech emotion recognition: Features, classification schemes and databases // Pattern Recognition. – 2011. – Vol. 44, № 3. – P. 572-587.

15.   Schuller B. Speech emotion recognition: Two decades in a nutshell, benchmarks and ongoing trends // Communications of the ACM. – 2018. – Vol. 61, № 5. – P. 90-99.

16.   Latif S., Rana R., Qadir J., Epps J. Speech emotion recognition: State-of-the-art review // IEEE Access. – 2021. – Vol. 9. – P. 114509-114539.

17.   Neumann M., Vu N.T. Improving speech emotion recognition with unsupervised representation learning on unlabeled speech // IEEE/ACM Transactions on Audio, Speech, and Language Processing. – 2021. – Vol. 29. – P. 2388-2399.

18.   Pepino L., Riera P., Ferrer L. Emotion recognition from speech using wav2vec 2.0 embeddings // Proceedings of the Interspeech Conference. – 2021. – P. 3400-3404.

19.   Wagner J., Triantafyllopoulos A., Schuller B. Deep learning in paralinguistics: Recent trends and perspectives // IEEE Signal Processing Magazine. – 2023. – Vol. 40, № 3. – P. 104-118.

20.   Zhang Z., Deng J., Schuller B. Advances in speech emotion recognition: A survey // IEEE Transactions on Affective Computing. – 2024. – Vol. 15, № 1. – P. 123-139.