[There is also a companion piece, describing distributed VR; the URL is http://ece.uwaterloo.ca/~broehl/distrib.html ]
This document is an attempt to gather and organize some ideas related to the idea of "behavior" in VR systems, particularly distributed (i.e., networked) VR systems. Many of these ideas have come from discussions on the net; many thanks to everyone who contributed.
This is very much a "living" document; I'm eager to hear what people think of it, and I'm quite willing to make changes based on that input. Feel free to get in touch with me; my email address is broehl@ece.uwaterloo.ca (if you don't hear back right away, please bear with me; I get a lot of email, and often fall behind).
If an entity has a collection of attributes, then behavior is the ever-changing state of all those attributes. For example, an entity may have an attribute called "location"; as it moves around, its location changes. The way in which its location changes over time is its behavior.
In this context, describing an entity's behavior as "deterministic" simply means that its state is a function only of time; that is, we can determine the complete state of the entity at any given simulation tick. We can say, for example, "show me the state of this entity at time t=1700 seconds into the simulation". We can run time forwards or backwards, jump to any point in time, and always know the state of the entity at that precise moment.
Entities with non-deterministic behavior, on the other hand, are by nature unpredictable. Certainly human beings fall into this category; we do things for our own (internal) reasons, and it's impossible to predict what a human being will do at some arbitrary time in the future. It's also not practical to "rewind" human behavior to some point in the past.
However, it isn't only human beings who exhibit non-deterministic behavior; any entity that has a glimmering of simulated intelligence will be similarly unpredictable. If we create a virtual squirrel, and we attempt to realistically model the behavior of real squirrels, then that virtual squirrel's behavior will effectively be non-deterministic.
What's more, any entity that responds to interaction (direct or indirect) with a non-deterministic entity is itself non-deterministic (since its state is unpredictable, being the result of unpredictable actions).
Each basic type (deterministic or non-deterministic) can be further subdivided. Entities with deterministic behavior can be "static" or "animated", and entities with non-deterministic behavior can be "Newtonian" or "intelligent".
Well, for one thing, the distinction between deterministic and non-deterministic behavior has important implications for re-use of resources, caching and network bandwidth. (There are other, deeper implications as well; they'll be discussed below.)
Consider a virtual church. Someone sits down and creates it in a CAD package, complete with altar, pews and a beautiful stained glass window. Once built, it would be nice to gets as much use out of it as we can; in fact, it would be nice if many people could use it simultaneously for different purposes.
This is where the deterministic/non-deterministic distinction becomes important. By creating the church as a set of deterministic entities, we can allow it to be used concurrently by any number of people. At the same instant of real-world time, the church might be used by one group of people for a virtual wedding, another group for a virtual funeral (perhaps for some who's lost their net access!), and yet a third group for an exorcism. All of them share the church as a kind of substructure, but their parallel worlds are distinct from each other because the set of non-deterministic entities (e.g. people) is different in each one.
In one of those parallel worlds, it might be 10 am; in another, it might be 2 in the afternoon, and in yet a third it might be midnight. The ambient light level would be correct in all three, since the person who built the church world defined the ambient light as a deterministic behavior.
Obviously, this gets the most use out of the church, as it allows an infinite number of people to use it simultaneously. Once they've seen the church in any "universe", they never have to download it again; it remains cached, thereby improving performance and reducing network bandwidth.
There's more to the deterministic/non-deterministic distinction than this, however; to discover its other advantages, we need to look at behavior more closely.
For example, consider a virtual squirrel. It might be capable of several "high level" behaviors, such as foraging for food, evading a predator, or finding a mate. At any given time, it's carrying out exactly one of those activities; it wouldn't be eating while fleeing a predator, for example.
At the very highest level of "virtual intelligence" lies the process of selecting one of those behaviors. This selection may be based on any number of factors; for example, the decision of whether to look for food or find a mate might be based on the current "hunger" value of the squirrel. The decision to fight or flee when encountering a predator might be based on the squirrel's speed, strength, distance to the predator, exhaustion, and so on.
There are a number of ways of modeling this "behavior-selection" level. For example, one might implement each of the high-level behaviors as a single state in a state machine; the entity would be in the "feed" state until the presence of a predator triggered a transition to the "flee" state. This works, but is a little difficult to scale; every time we add a new state, we (potentially) have to add a new transition to that state from each of the existing states.
Another approach (suggested to me by Eben Gay, of ERG Engineering) is to treat each behavior as a task in a multi-tasking system, and the behavior-selection mechanism as the scheduler. The scheduling of each task would be based on entity-specific parameters such as hunger, fear, and so on.
In fact, each intelligent entity might have a set of basic "drives". An acting teacher of mind had a good way of describing this. She said human beings have four basic impulses, which she called the four 'F's: Fight, Flee, Feed and... Mate.
In any case, regardless of how the high level behaviors are selected, each one will consist of a series of simpler activities. For example, foraging for food might consist of scanning the local environment for a food-like object, turning the squirrel's virtual body to face towards it, scampering forwards, and picking up the food. These simpler "mid-level" behaviors are the primitives that are used to implement the high-level behaviors.
The mid-level behaviors work by modifying an entity's state over time; that is, they update the entity's location, orientation, and so on at every simulation frame.
The four levels of behavior are as follows:
The higher-level behaviors (levels 2 and 3) can be implemented in any number of ways; they don't even necessarily have to involve software. For example, a human being operating a virtual creature (their "avatar", to use the popular term) is a source of high-level decision-making; the level 2 and level 3 behaviors are actually implemented in wetware. The user tilts a joystick to the right, and the "turn 45 degrees to the right over five seconds" level 1 behavior gets invoked; they tilt the joystick forward, and the "scamper" behavior is executed.
For that matter, you could even have your virtual squirrel be the avatar of a real squirrel, with sensors on its body and a tiny HMD!
The point is that the rest of the system doesn't know or care how the level 2 and 3 behaviors work; the only thing that matters is the level 1 behavior that ultimately updates an entity's state.
A naive approach would put it between level 0 and level 1; that is, what gets sent over the network is an entity state update giving the current set of attributes of the entity. This is what some simple network games (e.g. DOOM) do; they send the current location and orientation for an entity at every simulation frame.
A more sophisticated approach is to put the network interface between level 1 and level 2, in order to send behaviors that are a function of time. This is a generalization of what the "dead reckoning" approach in DIS does, and is described in more detail in the "distributed vr" overview document that serves as a companion to this piece. It's at (http://ece.uwaterloo.ca/~broehl/distrib.html ).
There are a number of advantages to placing the network interface between levels 1 and 2. Since level 1 behaviors are strictly a function of time, they're equivalent to the behaviors we identified for "animated, deterministic" entities earlier in this document. A single language could be used both for describing those simple animated behaviors, and expressing the level 1 behaviors of non-deterministic entities.
The fact that the level 1 behaviors are simple makes implementation straightforward, and portability easy to achieve. It also guarantees that the computational burden will be kept light, an important consideration on hosts that will already be heavily burdened just doing the rendering.
Keeping the level 1 behavior implementation language simple will also avoid "language wars", since any language can be used for levels 2 and 3 without affecting the universality of the level 1 behavior language. Tcl, Scheme, Lisp, Python... ultimately they would all cause level 1 behaviors to be invoked. The language (and indeed, platform) to use for implementing the high-level behavior of an entity is up to the creator of that entity.
For example, if one host received a behavior update for a wolf, and another didn't, the copy of my virtual squirrel that's running on the first machine will behave differently than the one running on the second machine. This divergence will quickly get worse, since other entities will base their decisions on the actions of the copy of the squirrel on their machines. The worlds continue to diverge, with no hope of ever reconciling.
That's the real issue in deterministic versus non-deterministic behaviors: non-deterministic decision-making must be done exactly once for each virtual entity, not replicated on every single host.
And as described earlier, level 2 and 3 behaviors may not be implemented in software at all; a human being with a joystick can't be "distributed". Since it's poor design for the hosts to have to know which entities are human-controlled and which aren't, our interface level should be chosen so as to hide the higher-level functions from the lower-level. That's why it resides between levels 1 and 2 and not any higher.
What's needed is an API for VRML browsers that lets the application which implements level 1 behavior (and which possibly communicates over the network) communicate level 0 changes to the browser.
One important advantage of this approach is that existing VRML browsers wouldn't need to support any specific programming language; they would simply provide an API (a relatively straightforward addition), and external applications would handle the rest. Users could mix and match browsers and behavior modules, or even write their own behavior modules if they chose. New level 1 languages could easily be implemented and experimented with.
Level 2 behaviors are capable of responding to their environment, but have no sense of volition; this maps nicely to "Newtonian" entities. Finally, "intelligent" entities exhibit the level 3 characteristic of having free will, desires, priorities, and so on.
There -- that ought to be easy, right? Well, maybe...
In any case, I've run out of things to say. I'm definitely open to any ideas and suggestions people may have; my email address is at the top of this document. I look forward to hearing from you.