Affective computing

src: vlab.org

Affective computing (sometimes called Emotional Intelligence , or Emotional AI ) is the study and development of systems and devices that can recognize, interpret, process and simulate human influence. It is an interdisciplinary field that includes computer science, psychology, and cognitive science. While the origin of the field can be traced back to the original philosophical question into emotion, the more modern branch of computer science originated with Rosalind Picard paper in 1995 on affective computing. Motivation for research is the ability to simulate empathy. The machine must interpret the human emotional state and adjust its behavior with them, responding appropriately to the emotion.

The difference between a sentiment analysis and an affective analysis is that the latter detects a range of emotions instead of just identifying the phrase polarity.

Video Affective computing

Territory

Detect and recognize emotional information

Detecting emotional information starts with a passive sensor that captures data about the physical state or user behavior without interpreting the input. The data collected is analogous to the cues that humans use to feel emotion in others. For example, a video camera might capture facial expressions, posture, and gestures, while the microphone might catch the conversation. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance.

Recognizing emotional information requires the extraction of meaningful patterns from the data collected. This is done by using machine learning techniques that process different modalities, such as voice recognition, natural language processing, or face expression detection, and produce labels (ie 'confused') or coordinates in the valence stimulus chamber.

Emotions in machine

Another area of â€‹â€‹affective computing is the design of computing devices proposed to demonstrate the innate emotional ability or being able to simulate emotions convincingly. A more practical approach, based on current technological capabilities, is emotional simulation in a conversation agent to enrich and facilitate interactivity between humans and machines. While human emotions are often associated with hormonal and other neuropeptide spikes, emotions in the machine may be related to abstract circumstances associated with progress (or lack of progress) in an autonomous learning system. In this view, the affective emotional state is related to the time derivative (perturbation) in the learning curve of the arbitrary learning system.

Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, attributes emotion to a wider problem of machine intelligence which states in The Emotion Machine that the emotion "is not so different from the process we call 'thinking. '"

Maps Affective computing

Technology

In cognitive science and neuroscience, there are two prominent models that illustrate how humans perceive and classify emotions. continuous and categorical models. The continuous model defines each emotional expression as a feature vector in the face space. This model explains, for example, how emotional expression can be seen at different intensities. In contrast, the categorical model consists of a classifier C, each set to a specific emotion category. This model explains, among other findings, why drawings in an altered sequence between happy and surprised faces are regarded as happy or surprised but not something in between.

These approaches have one major weakness in common: they can only detect an emotion from an image, this is generally done by the winner taking all the methods. However, every day we can see more than one emotional category from a single image. Both categorical and sustainable models can not identify many emotions, so a new way to model them is to consider new categories as overlapping from a small set of categories. A detailed study related to this topic is given in "A model of perception of emotional facial expressions by humans: an overview and research perspective".

The following sections consider the possible features that can be used for emotional recognition tasks.

Emotional greeting

Changes in the autonomic nervous system can indirectly alter one's speech, and affective technology can utilize this information to recognize emotions. For example, speech generated in a state of fear, anger, or excitement becomes fast, loud, and precisely spoken, with a higher and wider range of tones, whereas emotions such as fatigue, boredom, or sadness tend to produce slow, low-pitched, and talkative lisp. Some emotions have been found more easily identified computationally, such as anger or consent.

Emotional speech processing technology recognizes the user's emotional state using speech-computational analysis. Vowel parameters and prosodic features such as pitch variables and speech rates can be analyzed through pattern recognition techniques.

Speech analysis is an effective method for identifying affective states, having an accuracy reported on average 70 to 80% in recent research. These systems tend to outperform the average human accuracy (about 60%) but are less accurate than systems that use other modalities to detect emotions, such as physiological states or facial expressions. However, because many characteristics of speech are independent of semantics or culture, this technique is considered a promising route for further research.

Algorithm

The speech/text process affecting detection requires the creation of a reliable database, knowledge base, or vector space model, wide enough to meet every need for the application, as well as the selection of a successful classifier that will allow for quick and accurate emotion identification.

Currently, the most commonly used classifier is linear discriminant classifier (LDC), nearest k-neighbor (k-NN), Gaussian mixed model (GMM), vector engine support (SVM), neural network (ANN), decision tree algorithm and the hidden Markov model (HMM). Various studies show that choosing the right classifier can significantly improve the overall performance of the system. The list below gives a brief description of each algorithm:

LDC - Classification occurs based on the value obtained from a linear combination of feature values, which is usually provided in the form of a vector feature.
k-NN - Classification occurs by placing objects in the feature space, and comparing them with the nearest neighbor's k (training example). The majority vote decides the classification.
GMM - is a probabilistic model used to represent the presence of sub-populations within the entire population. Each sub-population is described using a mixed distribution, allowing the classification of observations into sub-populations.
SVM - is a linear classifier type (usually binary) that decides where of two (or more) possible classes, each input may fall into.
ANN - is a mathematical model, which is inspired by biological neural networks, which can better understand the non-linearity of the feature space.
The decision tree algorithm - works on the following decision tree in which the leaves indicate the result of classification, and the branch represents the next feature combination that leads to the classification.
HMMs - Markov statistical models in which country and state transitions are not directly available for observation. Instead, the output sequence depends on the visible state. In the case of effect recognition, the output represents the vector sequence of speech features, which allows the reduction of the status sequence passed by the model. The states can consist of various intermediate steps in the expression of emotion, and each has a probability distribution over the possibility of an output vector. The state sequences allow us to predict the affective states we are trying to describe, and this is one of the most commonly used techniques in the field of speech detection.

It is evident that having sufficient acoustic evidence, a person's emotional state can be classified by a group of majority voting classifiers. The proposed classifier group is based on three main classifiers: KNN, C4.5 and Kernel SVM-RBF. This set achieves better performance than each basic classifier is taken separately. This is compared to the other two classifying devices: the one-to-all (OAA) multichannel SVM with the Hybrid kernel and a collection of classifiers consisting of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers.

Database

Most systems currently rely on data. This creates one of the greatest challenges in detecting emotion based on speech, as it involves choosing the right database used to train classifiers. Most of the data currently owned is obtained from the principals and thus represents the archetypal emotions. Those who are called actuaw databases are typicawwy based on de Basic Emotiaw theory (by Paul Ekman), which assumes de existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), deodder muwtibwe a mix of de previous. Nevertheless, it still offers high quality audio and a balanced (though often too few) classes, which contribute to a high level of success in recognizing emotions.

However, for real-life applications, naturalistic data is preferred. Naturalistic databases can be generated by observation and subject analysis in their natural context. Ultimately, such databases should enable the system to recognize emotions based on the context and set objectives and outcomes of those interactions. The nature of this type of data allows for authentic real-life implementation, because the facts describe the natural state during human-computer interaction (HCI).

Despite the many advantages that naturalistic data has over data acting, it is difficult to obtain and usually has a low emotional intensity. In addition, the data obtained in a natural context has a lower signal quality, due to environmental noise and subject distance from the microphone. The first attempt to produce such a database was the FAU Aibo Emotion Corpus for CEICES, developed based on the realistic context of children (ages 10-13) playing with Sony's robot Aibo robot. Likewise, generating a standard database for all emotional research will provide a method for evaluating and comparing different impact recognition systems.

Speech reader

The complexity of the recognition process affects the increase with the number of classes (affect) and the description of words used in the classifier. Therefore, it is important to choose only the most relevant features to ensure the model's ability to successfully identify emotions, as well as improve performance, which is essential for real-time detection. The range of options that may be very wide, with some research mentioning the use of more than 200 different features. It's important to identify those redundant and undesirable to optimize the system and improve the success rate of true emotional detection. The characteristics of the most common utterances are categorized into the following groups.

Frequency Characteristics
- Accent form - influenced by the rate of change in fundamental frequency.
- Average tone - a description of how high/low the speaker speaks relative to the normal speech.
- Contour slope - illustrates the trend of frequency changes over time, may increase, decrease, or rise.
- The final decrease - the number of frequencies falls at the end of the speech.
- Pitch range - measures the distance between the maximum and minimum frequencies of a speech.
Time-related features:
- Speech rate - describes the rate of words or syllables spoken for a time unit
- Voltage frequency - measures the rate of occurrence of speech accent tone
Sound quality and energy description parameters:
- Breathiness - Measures the voice of aspiration in a conversation
- Brilliance - describes high or low frequency dominance In speech
- Loudness - measures the amplitude of the speech waveform, translated into speech energy
- Discontinuity Pause - describes the transition between sound and silence
- Pitch Discontinuity - describes fundamental frequency transitions.

Detect facial effects

The detection and processing of facial expressions is achieved through various methods such as optical flow, hidden Markov models, neural network processing or active performance models. More than one modality can be combined or fused (multimodal recognition, such as facial expressions and speech prosody, facial expressions and hand gestures, or facial expression with speech and text for multimodal data and metadata analysis) to provide a stronger estimate of the emotional subject. country.

Facial expression database

Creating emotional databases is a difficult and time consuming task. However, database creation is an important step in the creation of systems that will recognize human emotions. Most emotional databases available to the public only include facial expressions. In the expression database shown, the participants were asked to display different basic emotional expressions, while in the spontaneous expression database, the expressions were natural. Spontaneous emotional feelings require significant effort in the selection of appropriate stimuli that can lead to the rich emotional look that is desired. Second, this process involves tagging by a man who is trained manually that makes the database very reliable. Since the perception of expression and intensity are subjective, annotations by experts are essential for validation purposes.

Researchers work with three types of databases, such as peak-only image image databases, image sequence databases depicting emotions from neutral to peak, and video clips with emotional annotations. Many facial expression databases have been created and published for the purpose of recognizing expressions. Two of the most widely used databases are CK and JAFFE.

Emotional Classification

By conducting cross-cultural research in Papua New Guinea, in the Fore Tribe, in the late 1960s, Paul Ekman proposed the idea that emotional facial expression is not culturally determined, but universal. Thus, he suggests that they are biological origins and can, therefore, be categorized safely and correctly. Therefore he formally proposed six basic emotions, in 1972:

Anger
Disgust
Fear
Happiness
Sadness
Surprise

However, in the 1990s Ekman expanded his basic emotion list, including various positive and negative emotions that are not all encoded in facial muscles. Newly inserted emotions are:

Entertainment
Contempt
Content
Shame
Spirit
Guilty
Pride in achievement
Help
Satisfaction
Sensory pleasure
Shame

Face Action Coding System

Defines expression in terms of muscle action A system has been structured to formally categorize physical expressions of emotion. The central concept of the Face Action Coding System, or FACS, created by Paul Ekman and Wallace V. Friesen in 1978 is an action unit (AU). They are, in effect, the contraction or relaxation of one or more muscles. However, as simple as this concept may seem, it is enough to form the basis of a complex and uninterpreted emotional identification system.

By identifying different facial gestures, scientists can map it to the appropriate code of action unit. Consequently, they have proposed the following six basic emotional classifications, according to their unit of action ("" here "and"):

Challenges in face detection

As with any computational practice, in detecting effects by processing faces, some obstacles must be exceeded, to fully unlock the hidden potential of the overall algorithm or method used. The accuracy of modeling and tracking has been a problem, especially in the early stages of affective computing. As hardware evolves, when new discoveries are made and new practices are introduced, the absence of accuracy fades, leaving behind the noise problem. However, methods for noise removal there include the average environment, linear Gaussian smoothing, median filtration, or newer methods such as the Bacteria Foraging Optimization Algorithm.

It is generally known that the level of accuracy in facial recognition (not affective country acknowledgment) has not been brought to a high enough level to allow for its widespread use worldwide (there are many attempts, especially by law enforcement, that fail to identify criminals). Without improving the accuracy of the hardware and software used to scan faces, progress is greatly slowed down.

Other challenges include

The fact that expressions, as used by most subjects from various studies, is unnatural, and therefore not 100% accurate.
Lack of freedom of rotation movement. Detecting detection works very well with frontal use, but after turning the head more than 20 degrees, "there is already a problem".

Body movement

Gestures can be used efficiently as a means to detect certain emotional states of the user, especially when used in conjunction with speech and face recognition. Depending on the specific action, the gesture can be a simple reflexive response, like shrugging when you do not know the answer to a question, or it can be as complicated and meaningful as when communicating with sign language. Without using any object or surrounding environment, we can wave, clap or gesture. On the other hand, when using objects, we can direct them, move, touch or handle this. The computer should be able to recognize this, analyze the context and respond in a meaningful way, in order to be used efficiently for Human-Computer Interaction.

There are many methods proposed to detect body movement. Some literature distinguishes 2 different approaches in motion recognition: 3D-based and appearance-based models. The most important method uses 3D information from the key elements of the body parts to get some important parameters, such as the position of the palm of the hand or the corners of the joint. On the other hand, Appearance-based systems use images or video for direct interpretation. Hand movement has been a common focus of body movement detection, visible methods and 3-D modeling methods have traditionally been used.

Physiological monitoring

It can be used to detect users' emotional states by monitoring and analyzing their physiological signs. These signs range from the pulse and heart rate to minute contraction of the facial muscles. This area of â€‹â€‹research is still in a relative growth period as there seems to be more impetus towards the introduction of effects through face input. Nevertheless, this area is gaining momentum and we are now seeing real products applying the technique. The three major physiological signs that can be analyzed are blood volume, galvanic skin response, facial electromyography.

Pulse volume of blood

Overview

A subject's blood volume pulse (BVP) can be measured by a process called photoplethysmography, which produces a graph showing blood flow through the extremities. The wave peak indicates the cardiac cycle in which the heart pumps blood to the extremities. If subjects experience fear or shock, their hearts usually 'jump' and beat fast for some time, causing the amplitude of the cardiac cycle to increase. This is clearly seen in photoplethysmograph when the distance between the trough and the crest of the wave has decreased. As the subject becomes calm, and as the nucleus in the body expands, allowing more blood to flow back into the extremities, the cycle returns to normal.

Methodology

Infrared light shines on the skin by special sensor hardware, and the amount of reflected light is measured. The amount of reflected and transmitted light correlates with BVP because light is absorbed by the hemoglobin found rich in the bloodstream.

Losses

It can be inconvenient to ensure that the sensors illuminate the infrared light and reflected light monitoring always points to the same extremity, especially since the subject often stretches and re-adjusts its position when using the computer. There are other factors that can affect the pulse of a person's blood volume. Because it is a measure of blood flow through the extremities, if the subject feels hot, or very cold, then their body allows more, or less, blood to flow into the extremities, all this apart from the subject's emotional state.

Facial electromyography

Facial electromyography is a technique used to measure the electrical activity of facial muscles by strengthening small electrical impulses generated by muscle fibers when they contract. The face reveals a lot of emotion; however, there are two main facial muscle groups that are usually studied to detect emotions: The supercilii corrugator muscle, also known as the 'frown' muscle, pulls the eyebrows into the wrinkles, and is therefore the best test for the emotional response negative and unpleasant. The zygomaticus main muscle is responsible for pulling the corners of the mouth back when you smile, and therefore the muscle used to test for positive emotional responses.

Galvanic skin response

Galvanic skin response (GSR) is a measure of skin conductivity, which depends on how moist the skin is. Since the sweat glands produce this moisture and the glands are controlled by the nervous system of the body, there is a correlation between GSR and the state of body passion. The more aroused a subject, the greater the skin's conductivity and GSR readings.

It can be measured using two small silver chloride electrodes that are placed somewhere on the skin and apply a small tension between them. Conductance is measured by the sensor. To maximize comfort and reduce irritation, the electrodes can be placed in the foot, allowing the hands completely free to interact with the keyboard and mouse.

Visual aesthetics

Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty. Assessing beauty and other aesthetic qualities is a highly subjective task. Computer scientists at Penn State treat the challenges of automatically summing up the aesthetic quality of images using their visual content as a machine learning problem, with peer-rated on-line photo sharing sites as data sources. They extract certain visual features based on intuition that they can distinguish between aesthetically pleasing and unfavorable images.

Advances in Affective Computing - Analytics Industry Highlights

src: i1.wp.com

Potential applications

In e-learning applications, affective computing can be used to adjust the presentation style of a computer tutor when a learner is interested, interested, frustrated, or happy. Psychological health services, ie counseling, benefit from affective computing applications when determining the client's emotional state.

Robotic systems capable of processing affective information show greater flexibility while one is working in an uncertain or complex environment. Companion devices, such as digital pets, use affective computing skills to enhance realism and provide higher levels of autonomy.

Social robots, as well as the increasing number of robots used in health care benefit from emotional awareness because they can better assess the emotional state of users and patients and change their actions/programming appropriately. This is especially important in countries with an aging population and/or a lack of younger workers to meet their needs.

Other potential applications are centered around social monitoring. For example, a car can monitor the emotions of all passengers and engage in additional security measures, such as alerting another vehicle if it detects an angry driver. Affective computing has potential applications in human-computer interactions, such as affective mirrors that allow the user to see how he performs; an emotional monitoring agent sends a warning before someone sends out an angry email; or even a music player choosing a track by mood.

One idea put forward by Romanian researchers Nicu Sebe in an interview is analyzing a person's face when they use a certain product (he calls ice cream as an example). Companies will then be able to use the analysis to conclude whether their products will or will not be well received by their respective markets.

One can also use affective status recognition to assess the impact of TV advertising through real-time video recording of that person and through subsequent research of his facial expressions. On average the results obtained on a large group of subjects, it can be seen whether the ad (or movie) has the desired effect and what elements are most interested in the observer.

Affective computing is also applied to the development of communicative technology for use by people with autism.

Video game

Affective video games can access the players' emotional status through biofeedback devices. A very simple form of biofeedback is available through gamepads that measure the pressure at which the button is pressed: this has been shown to be strongly correlated with the passion level of the players; at the other end of the scale is the brain-computer interface. Affective games have been used in medical research to support the emotional development of autistic children.

Affective computing - 21st Century & emotions: affective computing ...

src: d3c33hcgiwev3.cloudfront.net

Cognitivist vs. interactional approach

In the field of human-computer interaction, the cognitive concept or Rosalind Picard's "information model" concept has been criticized by and contrasted with the "cognitivist or" interactional "pragmatic approach taken by Kirsten Boehner and others who view emotions as inherent. social.

Picard's focus is human-computer interaction, and its purpose for affective computing is "giving computers the ability to recognize, express, and in some cases, 'have' emotions." Instead, the interactional approach seeks to help "people to understand and experience their own emotions" and to improve computer-mediated interpersonal communication. It is not always looking to map emotions into an objective mathematical model for machine interpretation, but letting humans understand each other's emotional expressions in an open way that may be ambiguous, subjective, and context sensitive.

Picard's critics portray his emotional concept as "objective, internal, personal, and mechanistic". They say it reduces emotions to the discrete psychological signals that occur within the body that can be measured and which is the input to cognition, underestimating the complexity of emotional experience.

The interactional approach confirms that although emotions have a biophysical aspect, it is "culturally grounded, experienced dynamically, and at some level built in action and interaction". In other words, he regards "emotion as a social and cultural product experienced through our interactions".

Affective Computing and AI Emotion Recognition - Nanalyze

src: cdn.nanalyze.com

References

Does Emotive Computing Belong in the Classroom? | EdSurge News

src: edsurge.imgix.net

Source

Hudlicka, Eva (2003). "To feel or not to feel: The role of influence in human-computer interaction". International Journal of Human-Computer Studies . 59 (1-2): 1-32. CiteSeerXÃƒ, 10.1.1.180.6429 . doi: 10.1016/s1071-5819 (03) 00047-8.
Scherer, Klaus R; Banziger, T; Roesch, Etienne B (2010). Blueprint for affective computing: resource books . Oxford: Oxford University Press.

AI, ai-agents, Artificial Intelligence, augmented reality, Bitcoin ...

src: i2.wp.com

External links

The Affective Computing Research Group at MIT Media Laboratory
Emotion Computing Group at USC
Emotional Computing Group at the University of Memphis
The 2011 International Conference on Affective Computing and Smart Interaction
Brain, Body and Bytes: User Interaction Psychophysiology CHI 2010 Workshop (10-15, April 2010)
International Journal of Synthetic Emotions
IEEE Transactions on Affective Computation (TAC)
Renu Nagpal, Pooja Nagpal and Sumeet Kaur, "Hybrid Techniques for Human Face Emotion Detection" International Journal of Advanced Computer Science and Applications (IJACSA), 1 (6), 2010
openSMILE: popular open-source state-of-the-art toolkit for large-scale feature extraction for impact recognition and computational paralinguistics

Source of the article : Wikipedia

Affective computing

Minggu, 17 Juni 2018