Some Thoughts on Behavior in VR Systems

by Bernie Roehl

(Second draft: August, 1995)

[This document is kept online; the URL for it is http://ece.uwaterloo.ca/~broehl/behav.html ]

[There is also a companion piece, describing distributed VR; the URL is http://ece.uwaterloo.ca/~broehl/distrib.html ]

This document is an attempt to gather and organize some ideas related to the idea of "behavior" in VR systems, particularly distributed (i.e., networked) VR systems. Many of these ideas have come from discussions on the net; many thanks to everyone who contributed.

This is very much a "living" document; I'm eager to hear what people think of it, and I'm quite willing to make changes based on that input. Feel free to get in touch with me; my email address is broehl@ece.uwaterloo.ca (if you don't hear back right away, please bear with me; I get a lot of email, and often fall behind).

What is Behavior?

We often speak of entities in a virtual environment as having "behavior"; what exactly does that mean?

If an entity has a collection of attributes, then behavior is the ever-changing state of all those attributes. For example, an entity may have an attribute called "location"; as it moves around, its location changes. The way in which its location changes over time is its behavior.

A Taxonomy of Entities

In order to discuss the different types of behavior that entities may engage in, it's useful to divide entities into a number of categories. There are several ways to do this; for example, we might be tempted to divide entities into "dynamic" and "static" categories -- things that move, and things that don't. As it turns out, that's not the most interesting way of categorizing entities. Here's an alternative...

Two Basic Types of Entities

Instead of simply discussing static versus dynamic entities, it's useful to divide entities into those whose behavior is "deterministic" and those whose behavior is "non-deterministic".

In this context, describing an entity's behavior as "deterministic" simply means that its state is a function only of time; that is, we can determine the complete state of the entity at any given simulation tick. We can say, for example, "show me the state of this entity at time t=1700 seconds into the simulation". We can run time forwards or backwards, jump to any point in time, and always know the state of the entity at that precise moment.

Entities with non-deterministic behavior, on the other hand, are by nature unpredictable. Certainly human beings fall into this category; we do things for our own (internal) reasons, and it's impossible to predict what a human being will do at some arbitrary time in the future. It's also not practical to "rewind" human behavior to some point in the past.

However, it isn't only human beings who exhibit non-deterministic behavior; any entity that has a glimmering of simulated intelligence will be similarly unpredictable. If we create a virtual squirrel, and we attempt to realistically model the behavior of real squirrels, then that virtual squirrel's behavior will effectively be non-deterministic.

What's more, any entity that responds to interaction (direct or indirect) with a non-deterministic entity is itself non-deterministic (since its state is unpredictable, being the result of unpredictable actions).

Each basic type (deterministic or non-deterministic) can be further subdivided. Entities with deterministic behavior can be "static" or "animated", and entities with non-deterministic behavior can be "Newtonian" or "intelligent".

Static entities

A static entity is obviously deterministic; we know its state never changes, and therefore its state at any given time is known. Mountains, buildings and any kind of permanent structure falls into this category. That's not to say that those entities cannot change; certainly erosion could have an effect on a mountain, and a building may change as a result of human actions. However, a simulation which allowed those things to happen would not regard those as "static" entities.

Animated entities

Unlike a static entity, an animated entity does change state over time. However, the changes are easily predictable, and are a function only of time and possibly a set of pre-defined behavior parameters. Examples of animated objects are the hands of a clock, the moon in its orbit around the earth, an animated texture map for a waterfall, or the ambient light level of a world as it goes through a virtual day-night cycle.

Newtonian entities

Unlike static or animated entities, Newtonian entities respond to changes in their environment. However, they do so in a very straightforward, generic way. You can pick them up, put them down, drop them, throw them or bump them against each other and they'll react according to the laws of whatever virtual physics their creators implemented. They have no "volition", no goals, no appearance of intelligence. They simply respond to stimuli. Chairs, flashlights and chess pieces are all Newtonian entities.

Intelligent entities

Intelligent entities appear to have free will. They have specific goals, complex behavior, and are inherently unpredictable. Obviously, human beings fall into this category; so do birds in flight, spiders crawling on a wall, or our ever-popular virtual squirrel.

Parallel Worlds

So far we've created an apparently arbitrary taxonomy of virtual entities; why divide things into these particular categories? What does it buy us?

Well, for one thing, the distinction between deterministic and non-deterministic behavior has important implications for re-use of resources, caching and network bandwidth. (There are other, deeper implications as well; they'll be discussed below.)

Consider a virtual church. Someone sits down and creates it in a CAD package, complete with altar, pews and a beautiful stained glass window. Once built, it would be nice to gets as much use out of it as we can; in fact, it would be nice if many people could use it simultaneously for different purposes.

This is where the deterministic/non-deterministic distinction becomes important. By creating the church as a set of deterministic entities, we can allow it to be used concurrently by any number of people. At the same instant of real-world time, the church might be used by one group of people for a virtual wedding, another group for a virtual funeral (perhaps for some who's lost their net access!), and yet a third group for an exorcism. All of them share the church as a kind of substructure, but their parallel worlds are distinct from each other because the set of non-deterministic entities (e.g. people) is different in each one.

In one of those parallel worlds, it might be 10 am; in another, it might be 2 in the afternoon, and in yet a third it might be midnight. The ambient light level would be correct in all three, since the person who built the church world defined the ambient light as a deterministic behavior.

Obviously, this gets the most use out of the church, as it allows an infinite number of people to use it simultaneously. Once they've seen the church in any "universe", they never have to download it again; it remains cached, thereby improving performance and reducing network bandwidth.

There's more to the deterministic/non-deterministic distinction than this, however; to discover its other advantages, we need to look at behavior more closely.

Levels of Behavior

As described earlier, every entity has a set of "attributes" (e.g. location, orientation, shape, etc), and behaviors are things that alter those attributes over time. Just as it was informative to divide entities into different categories, it's useful to divide behaviors into different "levels".

For example, consider a virtual squirrel. It might be capable of several "high level" behaviors, such as foraging for food, evading a predator, or finding a mate. At any given time, it's carrying out exactly one of those activities; it wouldn't be eating while fleeing a predator, for example.

At the very highest level of "virtual intelligence" lies the process of selecting one of those behaviors. This selection may be based on any number of factors; for example, the decision of whether to look for food or find a mate might be based on the current "hunger" value of the squirrel. The decision to fight or flee when encountering a predator might be based on the squirrel's speed, strength, distance to the predator, exhaustion, and so on.

There are a number of ways of modeling this "behavior-selection" level. For example, one might implement each of the high-level behaviors as a single state in a state machine; the entity would be in the "feed" state until the presence of a predator triggered a transition to the "flee" state. This works, but is a little difficult to scale; every time we add a new state, we (potentially) have to add a new transition to that state from each of the existing states.

Another approach (suggested to me by Eben Gay, of ERG Engineering) is to treat each behavior as a task in a multi-tasking system, and the behavior-selection mechanism as the scheduler. The scheduling of each task would be based on entity-specific parameters such as hunger, fear, and so on.

In fact, each intelligent entity might have a set of basic "drives". An acting teacher of mind had a good way of describing this. She said human beings have four basic impulses, which she called the four 'F's: Fight, Flee, Feed and... Mate.

In any case, regardless of how the high level behaviors are selected, each one will consist of a series of simpler activities. For example, foraging for food might consist of scanning the local environment for a food-like object, turning the squirrel's virtual body to face towards it, scampering forwards, and picking up the food. These simpler "mid-level" behaviors are the primitives that are used to implement the high-level behaviors.

The mid-level behaviors work by modifying an entity's state over time; that is, they update the entity's location, orientation, and so on at every simulation frame.

The Four Levels of Behavior

Just as we had a taxonomy of entities, we now have a hierarchy of behavior. It has four levels, roughly corresponding to the four types of entities we identified earlier.

The four levels of behavior are as follows:

Level 0: direct modification of an entity's attributes
e.g. "set location to <123.97, 43.2, 118.7>"

Level 1: change in an entity's attributes over time
e.g. "scamper forwards at 30 cm/sec"

Level 2: series of calls to level 1 behaviors to perform some task
e.g. "forage for food"

Level 3: top-level decision making
e.g. "decide whether to forage, flee or find a mate"
The level 3 behavior has the job of selecting a level 2 behavior; it sets priorities and does "executive level" decision-making. The level 2 behaviors decompose a task into simpler level 1 actions, which in turn update the actual entity state.

The higher-level behaviors (levels 2 and 3) can be implemented in any number of ways; they don't even necessarily have to involve software. For example, a human being operating a virtual creature (their "avatar", to use the popular term) is a source of high-level decision-making; the level 2 and level 3 behaviors are actually implemented in wetware. The user tilts a joystick to the right, and the "turn 45 degrees to the right over five seconds" level 1 behavior gets invoked; they tilt the joystick forward, and the "scamper" behavior is executed.

For that matter, you could even have your virtual squirrel be the avatar of a real squirrel, with sensors on its body and a tiny HMD!

The point is that the rest of the system doesn't know or care how the level 2 and 3 behaviors work; the only thing that matters is the level 1 behavior that ultimately updates an entity's state.

Behaviors and Distributed VR

At this point, it's worth considering at what level of behavior the network interface resides.

A naive approach would put it between level 0 and level 1; that is, what gets sent over the network is an entity state update giving the current set of attributes of the entity. This is what some simple network games (e.g. DOOM) do; they send the current location and orientation for an entity at every simulation frame.

A more sophisticated approach is to put the network interface between level 1 and level 2, in order to send behaviors that are a function of time. This is a generalization of what the "dead reckoning" approach in DIS does, and is described in more detail in the "distributed vr" overview document that serves as a companion to this piece. It's at (http://ece.uwaterloo.ca/~broehl/distrib.html ).

There are a number of advantages to placing the network interface between levels 1 and 2. Since level 1 behaviors are strictly a function of time, they're equivalent to the behaviors we identified for "animated, deterministic" entities earlier in this document. A single language could be used both for describing those simple animated behaviors, and expressing the level 1 behaviors of non-deterministic entities.

The fact that the level 1 behaviors are simple makes implementation straightforward, and portability easy to achieve. It also guarantees that the computational burden will be kept light, an important consideration on hosts that will already be heavily burdened just doing the rendering.

Keeping the level 1 behavior implementation language simple will also avoid "language wars", since any language can be used for levels 2 and 3 without affecting the universality of the level 1 behavior language. Tcl, Scheme, Lisp, Python... ultimately they would all cause level 1 behaviors to be invoked. The language (and indeed, platform) to use for implementing the high-level behavior of an entity is up to the creator of that entity.

Why Not Level 2?

Is it possible to move the network interface up another layer? The problem there is that level 2 behaviors are non-deterministic; they involve making decisions based on available information. Since it's not practical to guarantee that every host on the network is absolutely up to date at every simulation frame, there's no way to ensure the consistency of the decisions from one "clone" of an object to another.

For example, if one host received a behavior update for a wolf, and another didn't, the copy of my virtual squirrel that's running on the first machine will behave differently than the one running on the second machine. This divergence will quickly get worse, since other entities will base their decisions on the actions of the copy of the squirrel on their machines. The worlds continue to diverge, with no hope of ever reconciling.

That's the real issue in deterministic versus non-deterministic behaviors: non-deterministic decision-making must be done exactly once for each virtual entity, not replicated on every single host.

And as described earlier, level 2 and 3 behaviors may not be implemented in software at all; a human being with a joystick can't be "distributed". Since it's poor design for the hosts to have to know which entities are human-controlled and which aren't, our interface level should be chosen so as to hide the higher-level functions from the lower-level. That's why it resides between levels 1 and 2 and not any higher.

Browser APIs

If we agree that we don't need to standardize level 2 and level 3 behavior, and if the network interface resides between level 2 and level 1, then the only remaining question is how the interface between level 1 and level 0 works.

What's needed is an API for VRML browsers that lets the application which implements level 1 behavior (and which possibly communicates over the network) communicate level 0 changes to the browser.

One important advantage of this approach is that existing VRML browsers wouldn't need to support any specific programming language; they would simply provide an API (a relatively straightforward addition), and external applications would handle the rest. Users could mix and match browsers and behavior modules, or even write their own behavior modules if they chose. New level 1 languages could easily be implemented and experimented with.

Entity Types and Behavior Levels

There's an interesting relationship between the entity types described earlier and the different levels of behavior. Level 0 behaviors involve directly setting entity attributes, with no notion of time; this corresponds to "static" entities. Level 1 behaviors modify entity attributes as a function only of time; this corresponds to "animated" entities.

Level 2 behaviors are capable of responding to their environment, but have no sense of volition; this maps nicely to "Newtonian" entities. Finally, "intelligent" entities exhibit the level 3 characteristic of having free will, desires, priorities, and so on.

Implementation

The only language that has to be standardized is the one in which level 1 behaviors are implemented. It will have to be able to modify entity state information as a function of time; it will need to have primitives that interact with the platform-specific data structures in a platform-independent way (most likely through a browser API), and it will need to be simple to implement.

There -- that ought to be easy, right? Well, maybe...

In any case, I've run out of things to say. I'm definitely open to any ideas and suggestions people may have; my email address is at the top of this document. I look forward to hearing from you.