RECOGNITION OF AUDIO-VISUAL EMOTIONS USING VIDEO CLIPS

Main Author: Pragya Singh Tomar* & Brahma Datta Shukla
Format: Article Journal
Terbitan: , 2018
Subjects:
Online Access: https://zenodo.org/record/5184752
Daftar Isi:
  • This research describes a multimodal emotion identification system that uses auditory and visual inputs to recognize emotions. Mel-Frequency Cepstral Coefficients, Filter Bank Energies, and prosodic characteristics are retrieved from the audio channel. Two techniques are being investigated for the visual element. First, the geometric relationships between face landmarks, such as distances and angles, are calculated. Second, we condense each emotional movie into a smaller collection of key-frames that may be used to visually distinguish between different emotions. To accomplish so, key-frame summary films are fed into a convolutional neural network. Finally, in a late fusion/stacking approach, the confidence outputs of all the classifiers from all the modalities are utilized to build a new feature space to be trained for final emotion label prediction. Experiments on the SAVEE, eNTERFACE'05, and RML databases reveal that our proposed solution performs significantly better than current options, defining the current state-of-the-art in all three databases.