Skip to the content of the web site.

Data structures

This covers some of the more interesting data structures that should be used in this course. This page describes those data structures at a higher level, without getting into significant implementation details. Of course, this course will use other data structures, but many of these are already covered in your course on algorithms and data structures. This page focuses on some of the more novel data structures that are useful in this course.

Cyclic array as a fixed-sized buffer/queue

When collecting data in a real-time system, it is often not possible to store all data that is collected from a sensor, and if it is being stored, it may not be in a format that is easily accessible (such as on a solid-state drive). Some algorithms in this course require only that the last n data points be stored. This can be done in O(1) time, with O(1) access to any of the entries by having an array data of capacity n that is initialized to a given value (often zero), and an index front that stores the most recent datum collected. The kth most recent datum can be retrieved by accessing array[front - k] if k >= front and array[(front + n) - k] otherwise. To insert a new datum, increment front and if the resulting value equals n, set front to 0 and place the new datum at that location. The value there can be returned, just in case the value being replaced is required by the algorithm in question.

There doesn't appear to be a standard name for such a data structure.

Alternating between arrays and queues

An appropriate implementation of a queue allows for O(1) insertions and removals, but accessing the kth entry may require an O(n) algorithm. On the other hand, accessing the kth entry of an array may be done in O(1) time, but appending new values may require the array to be significantly resized, thereby requiring O(n) time. There are some cases where it is not know how many new data will be generated, and therefore it is dangerous to create an array of a fixed capacity: that capacity may either be too much, or not enough. The first potentially wastes memory, and the second wastes time.

Thus, it makes sense to temporarily store new data in a queue, and only once all new data has been generated is the array appropriately allocated to hold the data and the data is then copied from the queue into the array. The queue can subsequently be deallocated and the array can be used to quicly access the entries.

Related to this, we may have n items in an array, but we may be appending an unknown number of new items to this already existing array. In a similar spirit, all new data should be placed into a queue and only when all new data has been collected is the array resized appropriately and the data from the old array together with the data in the queue copied over.

Look-up tables

Suppose you have n specific pairs of indices (i,j) or triplets of indices (i,j,k) (where each index is bounded) where you would like to associate at least some of the pairs or triplets with a unique index idx which ranges from 0 to n - 1. If not every pair or triplet maps to such an integer, then there may not be an easy formula that maps a pair or triplet to the corresponding integer and vice versa.

For this, the appropriate data structure is a look-up table. If most pairs or triplets are being mapped to an integer value, it makes sense to simply create a two- or three-dimensional array of integers where to_unique[i][j] or to_unique[i][j][k] stores the corresponding unique single index and storing n if that pair or triplet does not correspond to one of the n indices. (Recall that in C++, array indices range from 0 to n - 1, so if n is known, it makes sense to use n to identify a non-indicial entry.

Going back, you would have a two-dimensional array n × 2 or n × 3 array so that given an index idx (ranging from 0 to n - 1), the corresponding pair or triplet of indices would be (from_unique[idx][0], from_unique[idx][1]) or (from_unique[idx][0], from_unique[idx][1], from_unique[idx][2]).

Note that a C++ standard template library std::map class could be used in the first case if the number of pairs or triplets is significantly smaller than the range of those indices. Pairs or triplets could be stored using an appropriate std::tuple data structure.

Note that in programming languages where arrays are indexed from 1 to n, then 0 can be used to flag a pair or triplet that does not map to one of the unique n indices.