Hello everyone! This is my first blog for Makehuman, so I'll try to give you the 'big picture' of why I began to post three years ago on the forum.
I believe--and I'm quite sure Looxis does also-- that Makehuman has the potential to be a revolutionary mobile app; for example, the first open-source mocap avatar, or free 'talking head'.
But, with any mobile app there have to be design constraints, especially in terms of CPU footprint and screen resolution. So, I've re-created my Ellie model to function as a mocap avatar. UV texture mapping, sub-surface scattering and subtle lighting had to go. Next a simple 'thumbs-only' interface would need to be designed that could lip-sync at the end of a phrase, and add facial expressions. If no one else has coined the term, I'll call them 'expressicons'.
And here, we run into a major design constraint: which emotions, and who decides which ones? I'll answer quite simply: "We used the same system that the Forture 500 companies have settled upon, namely Paul Ekman's Facial Action Coding System, or FACS." So, it seemed like an easy task to match the FACS to the Makehuman expression interface.
Unfortunately, not so.
As of nightly build 2701, there are 56 separate expressions in Blender 2.58a export, each with at least three, to a maximum of six sliders.
We will assume a binary expression value, that is, an expression is either zero or one.
If we assume that each slider has only a binary full of 0 or 1, and we conservatively estimate that only four sliders will be used (central tendency theory applied here), then a permutation value of
P(56,4) = 56! / (56 - 4)! = 8814960
That's over 800,000 possible combinations, with severe limits on the settings. Clearly, for a real-world application such as I am suggesting, there will need to be some design constraints.
The goal of this discussion is to see how to manage scenarios of Makehuman characters interacting with real people, using motion capture. And here we can apply the core of Ekman's findings: there is no single simple human emotion in any verbal exchange, it is always of sequence of combinations of expressions.
I'm going to use artistic license to decider what combinations of emotions to use; however, to keep things simple, I'll use only three at a time. The sequence will be: intial emotional expression, opposing emotional expression, resolving emotional expression.
I'll use the same sentence in different contexts to indicate the power of Makehuman. For this blog, I'll do the lip sync first, then the emotional expression sequence.
(1) You (assumed male protagonist) are at a party, do your best to impress a young female (the Makehuman reactor) that you just met. You tell an outlandish story of an adventure you've just had, and as she waits for the punch line, you finish with: "Oh, it never really happened. I was just trying to impress you."
Her response is: "But, I really wanted to believe you." The espeak parameters are --ven+f4 -k10 -s 200.
For purposes of matching the Makehuman mocap avatar to the FACS, I'll use exaggerated positive expression.
The emotional expression sequence is:
Ekamn surprise = Makehuman Surprised 1.00 + Excited .25 at frame 20;
Ekman fearful = Makehuman Scared .75 + .25 Puzzled at frame 60;
Ekman true Smile = Makehuman Smile 1.00 + .5 Laughing.
The video link is: http://www.youtube.com/watch?v=mkGD33-S1M4
(2) You (assumed male protagonist) are on a first date with the Makehuman female character, and you park at the edge of a very large lot. Out in Western Canada, the example we always use is the West Edmonton Mall. When you leave the Mall, you can't find your car, although you assured her when you went in that you had an unerring sense of navigation. Frustrated, she replies with: "But, I really wanted to believe you." The espeak parameters are --ven+f5 -k20 -s 150.
Now that we've got an Ekman-to-Makehuman baseline, let's just go with Makehuman settings.
The emotional expression sequence is:
Makehuman skeptical 1.00 at frame 20;
Makehuman upset 1.00 at frame 60;
Makehuman grumpy 1.00 at frame 80.
The video link is: http://www.youtube.com/watch?v=ysB_HUOsAwo
And finally,
(3) You (assumed male protagonist) have just been accused of cheating on your girlfriend (Makehuman reactor) by her best girlfriend. You plead innocence, but she cuts you off with: "But, I really wanted to believe you."
The espeak parameters are: --ven+f5 -k30 -s 100. Note how I'm raising pitch and slowing down the statement.
The emotional expression sequence is:
Makehumna distress 1.00 at frame 20;
Makehuman disappointment 1.00 at frame 60;
Makehuman sadness 1.00 at frame 80.
The video link is: http://www.youtube.com/watch?v=myn_gjQSKsg&NR=1
Why did we need to call upon Paul Ekman's work? His research has demonstrated that negative emotions are more difficult to read than positive ones, and therefore more easily misunderstood. To communicate through avatars/talking heads we would need to agree on which emotional expressions mean what, and that requires a mapping of Ekman's Facial Action Coding System and the Makehuman expression interface.

0 comments:
Post a Comment