Interactive storytelling has been a buzzword in many conferences on gaming lately, and we are to thank Chris Crawford for that. Unfortunately, despite some claims of developers of getting closer to that holy grail of interactivity, any effort in the right direction has been negligible, in part due to the vague definition of what the term “interactive storytelling” encapsulates. For example, it is obvious that there cannot be a story without “agents making intelligent choices”, i.e. if all our agents would make random decisions on their next actions, stories will not develop. On the other hand, how “intelligent” do our story characters really need to be? This is where opinions start to diverge greatly, and the answer really depends on the type of game wherein interactive storytelling is to be embedded.
In this article I will try to offer my view of what interactive storytelling is, how it ties into the modeling of human-like AI agents and discuss the three main approaches to human behavior modeling that are being researched these days.
Introduction
I will start by offering a definition of Interactive Storytelling as it pertains to video games. True drama in the context of storytelling is not a series of physical events, but rather the emotional result of the interplay of:
1. temporally ordered engaging events and states, and
2. human (or human-like) actors with aspirations and motives in different emotional/ mental/ physical states, developing in an interesting and meaningful way,
3. superimposed by an artful narrative style.
In order to achieve true interactive drama for games, the concept of interactivity needs to be added to all constituents of the above definition. That may sound like a scary thing to contemplate, especially given my convoluted definition of drama, but it can be achieved, and that is what I am advocating in this article.
Within the computer game field, traditionally interaction has been limited to a limited number of objects. Normally these objects are either simple animated items or a number/state on the player. In the most interactive of games, those states in return affect a number describing the emotional state or “faction gained” of some of the NPC’s, but that is really the apex of interactivity in today’s games. For example if the player decides to burn down a destroyable house, the state of that house will be changed to “destroyed”, and later that day the inhabitants of that house might act angrily towards the player. Then, the player looses “faction” with those NPC’s and all is back to normal, while in order to gain true interactivity, all aspects of the “drama package” need to answer correctly to the dynamic requests of the player. This means:
- All events should be interactive, i.e. the player should be able to do anything they want to with any object/actor in the game. They will need to have a broad selection of “verbs”. E.g. if there is a watch on the table, the player should be able to smash it and then pick up a single gear from the debris and eat that, and he should get a bad stomachache as a result! Visually, and physically this is extremely hard to simulate with today’s technology.
- Giving the player the a huge amount of freedom will mean that he can virtually destroy the entire temporal and logical order of the plot by inserting one of his own events at the wrong time, in the wrong place! This would require the designers to impose artificial limits, which will be hard to foresee in if the possibility space of player actions is huge. Nowadays, many designers simply restrain players with a large number of no-no’s, even though they never really give them that much freedom of action to begin with.
- Any event in a real drama causes many state changes. For example, killing a man will cause the state of his wife to change to a widow. Probably a sad widow. The man’s income will vanish, and the widow might turn poor as a result. She might have to sell the house, and the house will have a new owner. And shortly thereafter the decoration of the house will probably have changed. How can all this be achieved? Not only will it require intense knowledge of human life, behavior, physics, motivations, etc. on the part of the game, it will also require a gargantuan quantity of computation and art assets.
- If a game is to provide human-like believable behavior for its NPC’s, each with their own mental and emotional problems, motivations, aspirations and beliefs, we will have to put a huge research effort into NPC AI. We know that human-like agents have not been created so far, at least not in the sense that they could act and talk meaningfully in the context of a story. As of today, an AI-controlled NPC with a character arc is something to behold in awe. We humans don’t understand nearly enough about ourselves to be able to create a good model of our mental, emotional or even physical behavior.
- An interactive “narrative style” would respond to the way the player and all other actors are feeling/thinking. If the player has killed one actors child, that actor would respond vehemently, and definitely not in sweet prose. Although games such as The Sims bypass this challenge with simple animations and emoticons, this is not enough to add dramatic style to the game. Real drama is as much about artful dialog as it is about deep characters.
Adding a dramatic touch to character dialog is actually the simplest problem to tackle. If enough different characters are supplied with enough lines of speech, it really only becomes a matter of how big a database you can prepare, and with context-sensitive text analysis and information mining systems, this shouldn’t be hard to achieve. Developing truly dynamic dialog generation mechanisms for these NPC’s, however, is a rather more complex issue, not only because it would have to be done in real time, but also because the world in which the NPC “lives” will change all the time, and the NPC must be able to respond to all external events without any noticeable “lag”.
If the challenges listed above have not discouraged you from aspiring to play the hero of a truly interactive dramatic storyline in a computer game, you must be either crazy or Chris Crawford , or both!
Central to the problem of interactive drama, as you will have noticed, is the simulation of life-like human behavior. The simulation we are talking about necessarily covers a wide range of actions (“behaviors”). There are three main approaches to simulating human-like behavior on computers:
- The reductionistic approach, which aims at modeling the physics, biology and chemistry of the human, approaches the problem bottom-up,
- The behavioral (functional) approach, which aims at modeling the human mind and has a top-down approach, and
- The black-box approach, which tries to emulate human behavior by using an adaptable “black-box” that will “learn” from real humans.
Below we will discuss the benefits and problems associated with each.
1. Reductionistic View of Human-like AI Simulation
The reductionistic approach to modeling humans, and one that is certainly achievable in the near future, starts out with the creation of a physical simulation layer for the human body. In this context, joints and bones need to be defined along with shapes, muscle powers, forces, weight distributions, etc. now this might sound like we are overly complicating things, but you will thank yourself for having implemented this layer of parameters when your simulated 300-pound NPC decides to stay in bed rather than to hunt the dragon that stole his half-sister princess!
Stage two is a physiological simulation, covering the thermodynamic energy balance within a human being. Food ingestion, the endocrine system, energy burned in muscle and tissue cells, death by injury or failure of organs, and various physiological behaviors of the human body must be simulated on this level. Again, if you think this is overkill, I need to remind you of the wonderful effects that adrenaline and testosterone have had on the human history, and within drama.
Stage three of our simulation would cover simple actions that are common among all animals, such as searching for food and feeding, resting, nesting, etc., and can be derived from the previous two levels of abstraction, due to the close relationship between hormones and animal physiology with instinctive behaviors. Manipulation of the world around the player be simple single-command actions (e.g. Lift food to your mouth!) and generation of sounds and recognizing them must be added to the code for this stage.
Stage four of our simulation will turn towards the actual “human” attributes of our NPC. It will combine the control of long-term physiological goals (such as mating) with social aspects of human behavior. Swarming, leadership, learning from others by proximity and attention, using simple tools to achieve goals for the actions mentioned in stage three, and family and herd formation are among the goals we could set for this stage. Building on the previous stage, creation of complex physical objects, and oral communication skills for our simulated NPCs fit within the framework of this stage.
Stage five of the simulation is dedicated to cultural learning and behavior and is a descendant of the previous stage. Here, personal goals and cultural value systems, along with interpersonal politics come into the equation. This is probably the hardest stage to emulate due to how little we really know about the parameters affecting societies and group dynamics. In addition, a single simulated NPC is of no use here, and we will require a huge number of them to run concurrently, just to train one single NPC.
Stage six of the simulation starts to turn the attention of our agent inward for the first time. Our NPC must be given the capability to simulate possible actions and reactions of the prior stages within his “brain” and make a guess of what the outcome of each simulation will be, before taking actions based on those results. This process is known as “dreaming”, “thinking”, “planning”, etc. in different contexts and is the beginning of awareness.
Stage seven. There is no stage seven! For all and any drama simulation purposes, our simulated NPC is a perfect replica of a human. Now add a physical environment, lots of other simulated NPCs, beasts and plants to the mix, and we can start to watch artificial drama played out all over the place!
Will it ever be possible to construct such a complex layered simulation? In my opinion, we will be able to achieve this in the next two decades or so. Even if we are not planning to fully implement any one stage, we can at least have a very simple representation of each stage, with simple rules and limited breadth. The depth of this six-stage system alone is enough to ensure meaningful “dramatic” behavior from our NPCs.
2. Behavioral Approach to Human-like Behavior Simulation
The behavioral approach, identifies the human mind/brain as the most important component in modeling human behavior, and focuses on analyzing the functionality of the human mind with the goal of compressing to in a mathematical model. The reasoning here, is that drama, being a narration of the interactions of human minds with each other and with the environment around them, can only be fully understood if we can describe the human “mind” in terms of information and processes.
The first problem we have to tackle is that there are many ways to look at the human “mind” because the term “mind” means so many different things to different people. To computer scientists, it is equivalent to the task it performs: learning, or adapting to new outside conditions. To philosophers it is the center of reason (notwithstanding the fact that so few “minds” actually reason!). To the psychologists the mind is the center of consciousness, i.e. the capability of being aware of one’s existence, whatever that may be!) To the biologists, the term “mind” is equivalent to the animal brain and whatever results from the chemical and electrical processes going on within that organ, a view, both despised and embraced by most of us commoners these days.
Except for the philosophers, the rest of the people interested in the understanding of the human mind can be grouped under the general umbrella of “cognitive scientists”. Because scientists from various fields such as computer sciences and electrical engineering, Psychology and psychotherapy, medicine and biology, chemistry and physics gather under this umbrella, naturally, each of them bring their own point of view to the discussion. In the following we will discuss a number of these points of view.
Interestingly, most simplistic AI emulations of human behavior so far have concentrated on this kind of modeling. The main points of interest in these models are the identification of motivators (i.e. internal and external stimuli which lead to actions by the agent), modeling internal states, which might consist of basic human emotions (e.g. fear, happiness, etc.) or other more abstract states, and learning, which is generally defined as the capability to generalize and adapt to new sets of stimuli form the environment, with the goal of fulfilling basic needs. Seeing that a huge amount of research is being invested in this paradigm, It seems that the short-term solution to human behavior modeling will come from this school of researchers.
Unfortunately, the emergent properties and the complexity of the human mind, along with the extremely complex interactions possible between multiple humans, will put a hard limit on the degree of realism that can be achieved by these, mainly analytical methods.
3. The Black-box Approach to Human Behavior Modeling
A number of scientists, who seem to have a deep respect for the complexity of the problem of modeling human behavior, have proposed to create adaptive mathematical models (universal computers) that have the capability of modeling any input/output set, and to train/fine-tune those systems with real-life data from real humans. Though this approach does not necessarily give us a deep understanding of the way these behaviors originate in the test subjects, it does allow us to create fairly good replicas of this behavior. Artificial Neural Nets and Hidden Markov Models, along with a growing number of similar paradigms have shown some promise in this respect.
In order to understand why these methods are still totally inadequate for any realistic simulation, you have to consider that searches over vast information spaces can be performed by the human nervous system in the blink of an eye (e.g. when recognizing an acquaintance’s face), while the same task took computers that were operating millions of times faster than any given biological neuron a lot of time, and even then, the pattern recognition rates were extremely poor compared to those of the human brain. Obviously there had to be more to a biological brain than just simple computations.
Following findings in human physiology, aided by the study of numerous diseases that only affected one part of patients’ brains, scientists started to attribute the amazing power of brains to the vast number of “processors” in the brain. At that time it made sense that a computer, though running instructions a lot faster than the brain, only had one processor, whereas the brain could theoretically execute more than roughly 10000 additions/ multiplications on a single cell alone within a hundredth of a second (not considering the recovery time for a neuron and the fact that high-valued inputs to a neuron cause multiple signal pulses at the output). With an estimated 100 Billion (100.000 Million) Neurons in the brain alone, that would amount to a processing power of approximately 10^17 operations/second.
Most recent supercomputers (as of 2008) are attempting to break the 1000 Teraflop barrier, and even though one flop cannot be directly associated with one pulse, or a series of pulses from any given neuron, we see that supercomputers are slowly encroaching on the capabilities of the human brain within the realm of processing power. (1000 Teraflops=10^15 Operations/second). So why aren’t we starting to see even remotely intelligent behavior on these supercomputers. Why aren’t they starting to act like humans or even simple animals? If joined together, all the computers in the world would have a lot more processing power than many humans (and they are joined (through the internet e.g.). So why aren’t we hearing from this a global mega-computer how pissed off it gets when we turn our PC off?
The main reason as we will see is the lack of unified software running on this incredible silicon-overmind. Let us have a look at the global-silicon-overmind. Its body is spread thin over the entire globe. It is clumpy, and barely can move. Each part of its body is trying hard to communicate with other parts. The communication lines themselves have very low bandwidths compared to the processing power. Every single part is trying to achieve whatever it is told to do by the human sitting behind it, and is in no way helping the super-organism. There are a myriad of different and often conflicting instructions and commands running on each cell within this silicon-overmind, etc. You get the idea: it is a processing mess. The software running on the human brain, however (in a way the instruction-set and hardware are one and the same in biological brains; merely the variables are changing in short term.) has been evolved over billions of years (not millions as some suggest, because we humans are ultimately the result of the evolution of the first DNA/RNA molecules that came into being roughly 4.2 billions of years ago, i.e. if they didn’t exist on other planets even before that).
Now, it is true that the ultimate goal of nature has never been the creation of man – or his brain for that matter, but we are indeed the most complex creature aware of its own individuality. This incredible time span, and the speed at which mutations occur along with the huge variety in the physical environment in which this genetic algorithm is running allows for an extremely fine-tuned and complex set of instructions to be hard-coded into our brains and bodies. The variables in our brains which can be updated, and which we humans like to gather under the umbrella term “ego”, are an additional capability that allow us to learn and adapt to the changing environment. Now you might think that the genetic algorithm is “rather dumb”, meaning that it takes a lot of iterations to achieve any concrete goals, but that is only partially true: once the initial iteration are done, convergence to a “good” solution is rather fast.
Can a search algorithm based on stochastic processes, pushed along with simple operators and random mutations beat a human programmer in the design of the ultimate adapting machine? Apparently yes. Let us not forget that the genetic algorithm running on the surface of Earth created replicating simple molecules first. Then it moved on to protein molecules capable of protecting these structures from the chemical onslaught of their surroundings. Further along the way some proteins bunched together and functional (i.e. chemically “useful”) organelles appeared. These could replicate and bunch together for even better chemical protection. Once these organelles started to create a useful chemical balance among them, the guiding hand of the theory of selection decided it was time to put a wall around them, i.e. most structures that had no wall around them perished. DNA/RNA strings were upgraded with a cellular membrane. After thorough testing by the environment and finding a number of good matches, the genetic algorithm decided it was time to bunch together a number of organelles and put a strong wall around them. This is what we call Eukaryotic cells nowadays (i.e. cells with no central data storage). Once Eukaryotic cells seemed to work fine, they were upgraded with once they had a protective wall around them, and once all was well and working, only did nature move to the next step of adding a central data storage unit (the cell nucleus) and created the Prokaryotic cells. Then, once all unicellular organisms were performing perfectly, tiny colonies appeared, and once connecting structures and useful interconnects had appeared, multi-cellular organisms started to gain popularity.
Up until the arrival of humans, and luckily beyond it, this process of design (crossover operations/replications), testing (survival fitness evaluation) and redesign has continued on a huge scale. (“The Selfish Gene” by Richard Dawkins has an interesting chapter on this process in evolution.) If we consider every cell to be one unit in our algorithmic scheme, the size of the population for this simulation that runs over billions of years is of the order of 10^28-10^30. If the lifecycle of a cell is to be considered an average of 24 hours, then 4 billion years allow for 1.5×10^12 simulation rounds, each with 10^29 participants. This is a huge number of trials, even within the vast search space that the chemical/physical world poses. The number 1.5×10(12+29) is unimaginable to mere mortals, and even computer scientists and advocates of artificial evolutionary algorithms! So what is the answer to our question? Can a clever human programmer design computer code to be more efficient at surviving than us humans? Maybe! If that programmer was allowed to work on a computer holding around 10^29 records of around 10^10 bits each (the approximate number of base pairs in the human genome), completing more than 10^12 total simulation/design iterations).maybe!
Here is something to think about: Our programmer would need to simulate the entire surface of earth on the sub-atomic scale in order to provide those organisms with a chance to be “un-naturally selected”! How many particles are there to simulate? Can all quantum processes be simulated by non-quantum computers in linear time? I do not think so.
It would take more particles to create a computer capable of running the proposed simulation than the number of particles to simulate, and that means we would have to turn the entire surface of the earth into a computer, and that is not really an option with the price of hardware these days. See, the good thing about real physical particles is that they do extremely well when asked to simulate themselves! That is why the physical world is so good at information processing, and ultimately at bringing forth intelligent life. In fact, that may be the very reason why computers can even exist in our universe!
(For a more intelligent review of information theory as applied to genetic coding I refer you to the excellent reference book by Hubert Yokey, titled “Information theory, Evolution and the Origin of life”)
Conclusion
At this point I’ll admit that I digressed from our main discussion, but it does pay to know the extent of the problem at hand. In summation, I believe that the black-box approach shows the most promise in the long run, and that human-like AI can be modeled by structures on the same order of complexity as the human brain (and on another level, the human society).
With this, I conclude the first article in this series. In the second installment, titled “Environment Modeling for Interactive Storytelling” I will discuss another aspect of interactive drama, namely, modeling the environment in which the drama unfolds.
Comments
Leave a comment Trackback