Distributed Virtual Reality -- An Overview

by Bernie Roehl

(First draft: June, 1995)

[This document is kept online; the URL for it is http://ece.uwaterloo.ca/~broehl/distrib.html ]

[There is also a companion piece, describing the behavior of virtual entities, at http://ece.uwaterloo.ca/~broehl/behav.html ]

One of the hottest topics in Virtual Reality research is the idea of "distributed" VR. A lot of ideas have been tossed around in various newsgroups and mailing lists; this document is attempt to bring together some of the more interesting concepts and organize them in a way that's easy to follow.

This is very much a "living" document; I'm eager to hear what people think of it, and I'm quite willing to make changes based on that input. Feel free to get in touch with me; my email address is broehl@ece.uwaterloo.ca (if you don't hear back right away, please bear with me; I get a lot of email, and often fall behind).

What is "Distributed VR"?

The idea behind distributed VR is very simple; a simulated world runs not on one computer system, but on several. The computers are connected over a network (possibly the global Internet) and people using those computers are able to interact in real time, sharing the same virtual world. In theory, people can be sitting at home in London, Paris, New York and Edmonton, all interacting in a meaningful way in VR.

It's clear that there are a number of obstacles to be overcome in achieving this goal. The fact that we want people to be able to access if from their homes means that we have to be able to run over relatively limited-bandwidth links, such as 28.8k modems. The fact that we want to run over the Internet means that we have to tolerate a certain amount of latency is the delivery of update information. Finally, the fact that people are running on different computer systems with different hardware and different software means that we must design the system for portability.

Some Jargon

Each of the computers participating in the simulation is called a "host". On each host there are a number of "entities" (things in the virtual environment) that communicate their changing state by sending "update messages". The specific entity that corresponds to a human participant's virtual body is called their "avatar".

The Naive Approach

The simplest approach, and the one that some multi-player computer game systems use, is to simply have each host broadcast the location of each entity that it maintains. These broadcasts are received by every host in the simulation, and are used to update their local copy of the "world database".

The first networked version of the computer game "Doom" worked in a "broadcast" mode; each participant constantly broadcast the current state of their "avatar". Most multi-player arcade games work the same way.

This approach works acceptably on small, dedicated networks; however, there are a number of problems with it. The most important problem with broadcasting is that every machine on the subnet must receive and process every update packet; this includes machines which aren't participating in the simulation! That's not a problem on a dedicated LAN, but networks that are being used for other things can't have huge amounts of broadcast traffic on them; this is why so many companies and universities adopted a "no network DOOM" policy. (DOOM, incidentally, switched to point-to-point messages for subsequent releases in order to be more friendly to non-dedicated networks).

The other problem is that simply broadcasting your current location and orientation requires a lot of bandwidth, since those (potentially) change every time through your simulation loop. When you're dealing with a large number of entities and limited bandwidth, the system just stops working.

In short, the naive approach doesn't scale up.

How Have Other People Solved These Problems?

In designing a standard for representing scene geometry, the VRML group chose to use a subset of an existing format as their jumping-off point, rather than create yet another new format. This approach has worked well; maybe we can do the same thing in selecting standards for distributed VR?

The largest, best-known and most successful standard for distributed VR has been DIS -- the Distributed Interactive Simulation protocol. As its name suggests, it addresses the very issues we're concerned about. Perhaps we could simply adopt it, and declare our task complete?

Well, as it turns out, it's not quite that simple. DIS, like its predecessor SIMNET, is designed for a very specific application domain: military simulations. As we'll see, it's extremely well-optimized for that application, but is not an ideal choice for our needs. However, we can benefit from a lot of the work done on DIS; they've already encountered and dealt with a lot of the same challenges we'll be facing in designing a distributed VR standard.

(More information about DIS is available from the Naval Postgraduate School: ftp://taurus.cs.nps.navy.mil/pub/mosaic/nps_mosaic.html )

The Basic Ideas of DIS -- PDUs and Dead Reckoning

As mentioned earlier, the three basic challenges are: DIS addresses the first of these problems, compatibility, by defining a standard message format for interchanging information between simulation hosts. This standard format is called a PDU, for Protocol Data Unit. There are many different types of PDU, most of which (like requests for resupply of munitions) are not applicable to our needs. However, one of the formats, the Entity State PDU or ESPDU, contains information very similar to what we need to send between hosts.

The second and third problems, limited bandwidth and update latency, are dealt with in DIS using a technique called "dead reckoning". The idea behind dead reckoning is simple: instead of just sending an entity's location, a host sends a message (an ESPDU) that contains the entity's location, a timestamp, and a velocity vector. Using that information, each host in the network can then extrapolate the entity's location without additional updates. This can be done for entity orientation as well as location.

Each entity runs its own full-up simulation, and also runs the simple dead reckoning model for itself. It keeps track of the actual location predicted by the two models; when they diverge by more than a certain amount, it sends out another update to bring everyone's model of the entity back in line with what the entity is actually doing.

In addition, each entity sends out periodic update messages (say once every five seconds or so) that act as "keep-alives". These keep-alives actually serve three purposes: if someone enters the simulation after it's started, they'll get caught up with every entity's state within a few seconds. If update messages get missed for any reason, the system recovers gracefully; within a few seconds, a new update message will arrive and bring the host up to date. If no updates are received from an entity within a certain period of time, the other hosts on the network can assume the entity is no longer around.

Dead reckoning produces impressive results. Since updates are only sent when needed, the amount of traffic is greatly reduced. The system works particularly well for the problem domain it's designed for: military simulation. Since every entity is either a vehicle or projectile, the dead reckoning approach works nicely.

Limitations of DIS

However, a more general distributed VR system doesn't work nearly as well with standard DIS. Let's say we want to simulate a park, for example; we want to have birds fluttering around overhead, squirrels foraging for nuts, and so on. Despite the fact that birds do occupy airspace, their behaviour is surprisingly unlike that of a jet fighter aircraft. And despite the fact that squirrels are land-based, they're not very much like tanks.

Perhaps more importantly, people don't move anything like vehicles; nor do they behave like projectiles (unless you fire them out of a cannon, which is an exceptional case).

A proposed extension to DIS by Pratt, Barham and a number of other people is described in "Insertion of an Articulated Human into a Networked Virtual Environment"; the Postscript version of that document is online (on taurus.cs.nps.navy.mil, in the directory pub/NPSNET_MOSAIC; the filename is Insertion.of.an.Articulated.Human.into.a.Networked.VE.ps).

The paper attempts to deal with the people problem. It considers people to be dismounted infantry, and sends information about their configuration; small packets of information giving the rotation angles of each limb are sent as part of the PDU.

This works well, but is (by design) highly anthropomorphic. The packet format actually stores information about the rotation of each hip joint; it has no provision for body configurations other than human. In terms of our park example, the data structures they propose make no provision for a bird's wings or a squirrel's big fluffy tail. Treating birds and squirrels as "dismounted infantry" isn't really the best approach.

(This is not meant as a criticism of the work done on dismounted infantry; it's certainly better than the standard DIS approach which would require us to treat each of the squirrel's legs as a tiny gun turret!)

Beyond Dead Reckoning

Dead reckoning by itself, even when supplemented by anthropomorphic joint information, isn't enough. Does that mean we should be looking for a completely different approach? Not necessarily.

After a little thought, it becomes clear that dead reckoning is just a special case of a more general idea: instead of sending location and orientation updates, we should be sending updates about an entity's "behaviour". In a sense, we're generalizing dead reckoning, and extending it to meet our needs.

The update messages an entity sends out will specify a "behaviour" and some behaviour parameters; the dead reckoning approach then becomes just one of a repertoire of behaviours, which might be arbitrarily complex. In the case of the dead reckoning behaviour, the parameters are the initial time, initial location, and initial velocity vector (direction and magnitude).

The set of behaviours available to an entity should be extensible, so that we don't have to anticipate every possible behaviour ahead of time. The behaviour scripts are complex enough that we don't want to try to squeeze them into our equivalent of a PDU; instead, we want them to be distributed in much the same way that geometry or texture-map information is.

Clearly, we need a standard way of describing behaviour; more on this later.

A Brief Digression: TCP/IP

As mentioned earlier, broadcasting update messages is a bad idea. In order to understand the alternatives, it's worth taking a quick look at the underlying network infrastructure that's available to us.

The Internet is based around a family of protocols called TCP/IP. The IP (Internet Protocol) part is the lowest level; it handles addressing and routing. IP packets can be sent over Ethernet, fiber optics, telephone lines, radio links, tin cans and string, or any other medium that can move bits.

Above IP are two other protocols: TCP and UDP. TCP, the Transmission Control Protocol, provides a connection-oriented, reliable byte stream form of communication. Applications can use TCP to open a logical connection to another host on the Internet, send data down that connection, and receive data back. TCP handles the complex chores of breaking the stream of data into smaller pieces, doing checksums on the data to make sure it arrives intact, sorting the packets as they arrive to make sure they're in the correct order, requesting retransmission when needed, handling flow control, and lots more. TCP uses IP to actually move the packets to other hosts.

TCP is the protocol on which most Internet applications run. The Web uses HTTP (Hypertext Transport Protocol) which runs over TCP; when you click on a link, a connection is opened to another host using TCP, the URL is sent down the wire, the data is received back and the connection is closed. Similar techniques are used for Finger, SMTP (electronic mail) and NNTP (USENET news). In all cases, the hard parts are handled transparently by TCP.

UDP, the User Datagram Protocol, is quite different from TCP. It provides a connectionless, unreliable way of sending "datagrams" from one host to another. The word "unreliable" in this context doesn't mean that UDP is somehow flakey or error-prone; it just means that there's no guarantee that a specific packet will arrive. However, if a packet does arrive, it will arrive intact. UDP is used by NFS (the Network File System), NTP (Network Time Protocol) and several others.

There are some important advantages in using UDP instead of TCP. The problem with TCP lies in its strengths. The fact that it does all kinds of flow control, error checking and sequencing means that it has a fair amount of overhead; this translates into lag, which is one of our enemies.

UDP packets are relatively "lightweight". Although they get lost occasionally, there'll never be any congestion; they won't clog things up, they'll just be discarded. Also, unlike TCP, UDP packets can be broadcast on subnets (since there's no "logical connection" involved).

However, because UDP is "unreliable", it requires the use of "stateless" protocols; in other words, each message must be complete and self-contained, and make no assumptions about previous messages having been received. NFS is an example of a stateless protocol; so is DIS.

DIS uses UDP broadcast packets to send its PDUs to other hosts. Each ESPDU has the complete state of the entity, and they're re-broadcast every few seconds (or more often, if the entity determines that more frequent updates are needed).

Why Broadcasting Doesn't Scale

The broadcast approach works for DIS, because it has its own dedicated network; unlike DOOM, there's no need to be concerned about the broadcast activity playing havoc with other users of the network.

However, the broadcast approach still doesn't scale. Even with dead reckoning, the problem of limited bandwidth is still there; as the number of entities grows larger, the slower-speed links become saturated. Even if bandwidth weren't an issue, the processing of updates for a large number of entities begins to exceed the capacity of the slower hosts, especially since they must also do all the rendering, input device processing, and so on.

The designers of DIS are well aware of these problems, and they're looking at solutions. The central idea, common to DIS and many other approaches as well, is the use of "update filtering".

The basic idea is again very simple. The virtual world is divided up into a large number of "cells", must like the way the cellular telephone system works. These cells are used as the basis for filtering update messages.

Each host participating in the simulation determines an Area of Interest (AOI), consisting of a number of cells within its range of vision.

The DIS work on the project is described in the document "Exploiting Reality with Multicast Groups" by Macendonia, Zyda, Pratt, Brutzman and Barham. The Postscript version of the paper is online (the URL is ftp://taurus.cs.nps.navy.mil/pub/NPSNET_MOSAIC/macedonia_exploit.ps.Z ).

The paper uses an example where the cells are hexagonal, somewhat like a strategy board game or certain types of military simulation computer games. This hex grid is well-suited to military simulations, and is a closer model for circular Areas of Interest than a square grid would be.

As a participant moves around, cells will enter and leave their Area of Interest; at any given time, they're only receiving updates for cells they can see, resulting in a small, manageable number of updates. The combination of AOI filtering and dead reckoning produces significant bandwidth savings.

Now clearly, hexagonal grid cells are better-suited for military simulations than for real-world VR. Our cells (perhaps better described as "zones") wouldn't necessarily be hexagonal grid cells; they would probably be defined by other criteria. For example, a virtual mansion might be divided up into a large number of rooms, each of which would be its own zone. Since no regular grid would be used, there would need to be some way to specify zone-to-zone visibility.

Othe than that, the basic idea is sound; just as we extended the idea of dead reckoning to the more general concept of behaviour, we're extending the idea of update filtering from hexagonal grid cells to arbitrarily-shaped volumes of space.

Multicasting

Given that we want to filter messages, what mechanism should we use for doing so? The DIS system uses "multicasting", and that approach seems promising for more general distributed VR applications as well.

Multicasting is an attempt to have the best of two worlds: broadcasting and point-to-point links. The idea is that any given host, in addition to having its own Internet address, can belong to a number of "multicast groups". Each multicast group has it's own special Internet address; IP addresses in a certain range have been set aside for this purpose. When any host sends a message to a multicast address, that message is sent to all the hosts that belong to that multicast group. In effect, it's like broadcasting to a subnet that spans continents.

Sending a multicast message is better than having to send point-to-point messages to every host in the simulation, so it's easier on the sender. Hosts that aren't in a particular multicast group will ignore the packet at a very low level, with minimal overhead; this makes it easier on every host on the network.

There are some problems with multicasting; the most important is that it's not universally implemented. Some systems have it, some don't. An effort is underway to link small pockets of multicast-capable machines to each other over the Internet; the result is a "multicast backbone" or MBONE. The MBONE system uses "tunneling"; it wraps the IP packets destined for a multicast address inside of another IP packet, which travels through the regular Internet from one MBONE subnet to another.

Multicasting and Zones

How does multicasting fit in with Area of Interest management? In the DIS system, each cell has its own multicast group address; in other words, there's a one-to-one correspondence between cells and multicast addresses.

As participants move around, they enter and leave cells; this corresponds to entering and leaving multicast groups. Since participants only receive message for multicast groups they're in, the multicast system itself does our message filtering for us.

The implementation of multicasting for DIS also does away with the "keep-alive" messages, which (according to their figures) account for a high percentage of the total traffic. Instead, a "group leader" is chosen for every group (i.e., cell); the group leader is the simulation host which has been in the cell the longest. When a host first enters a cell, it sends a message to the multicast address for that cell asking the group leader to identify itself; the response is a message informing the newly-arrived host where they might obtain the complete current set of Entity State PDUs for this zone. The newcomer then requests and receives that set.

Another, simpler approach might be to send an "identify yourself" message to the multicast address; all the hosts would respond by sending their most recent update message, and perhaps (in the absence of any kind of "registry") a message containing the URI of a file containing more detailed information about them.

Multicasting is not the only approach to filtering, but it's a good one; it doesn't involve any higher-level functionality to do the filtering, and therefore imposes minimal overhad.

Open Questions About Zones

There are a number of questions to be answered relating to zones:

Clearly, some additional work is needed in these areas.

Why Not Just Use DIS?

It certainly sounds like DIS has addressed most of the issues we need to deal with. Why not just use DIS?

Well, as mentioned several times above, DIS is designed for a very specific application. Most of the PDUs are inappropriate for general VR use (e.g. even the most aggressive squirrels are unlikely to issue "munition resupply request" PDUs). The Entity State PDU is too specific, and doesn't generalize well to our concept of sending "behaviour". It also includes a great deal of superfluous information, such as which army the squirrel belongs to. This makes the individual PDUs very big, which greatly (and needlessly) increases the bandwidth requirements. A leaner, simpler format is needed.

As a minimum, a message must contain the entity id, a timestamp, and a behaviour description. In fact, that may be all it needs to contain.

What We Can Get From DIS

DIS brings us several powerful ideas:

Accessing Entity Information

One area that DIS doesn't address, which we must, is that of initially getting information about the entities in the virtual environment. We're not just dealing with tanks, planes and projectiles here; we want to be able to have objects with arbitrary geometry (and other attributes) in our simulation.

DIS assumes everybody has the complete database before the simulation starts; we need something more dynamic. We do need a way to uniquely identify every entity in the simulation, and provide a way of accessing their geometric description (most likely specified using VRML) as well as other information about them.

To Register or Not to Register?

Essentially, we need a way of generating short, unique identifiers for every entity in the simulation. One approach (suggested by Eric Van Hensbergen) would be to use the IP address of the entity's host, plus a short (16-bit) number to identify the entity on that host.

This avoids many of the problems of a central registry; there's not much overhead, since the messages would contain the originating IP address in any case. A mechanism would have to be established on each host for generating the per-entity sequence numbers, but that's a far easier problem to solve.

There would need to be some mechanism to map an entity id to a URI giving more details about the entity. Perhaps it could be as simple as sending a request to the entity's host asking for more information about the entity, and receiving a response message back containing a URI.

About Time

It's important that all the systems remaining synchronized; in other words, they must all agree as to the current simulation time. Existing protocols should be used wherever possible, and NTP (the Network Time Protocol) should be sufficient for this purpose.

Closing thoughts...

This overview is intended to provoke discussion, comment, and new ideas. Many thanks to those of you who have read all the way through this rather lengthy documemt. I'm open to comments and suggestions for ways of making it better, so feel free to contact me; my email address is given at the beginning of this document.