Introduction to Programming and C++

YOU ARE NOT PERMITTED TO CUT-AND-PASTE CODE FROM THIS PAGE TO PROJECT 1

A linked list is the prototypical example of node-based data storage. Each node is associated with both data and a reference/pointer to the next node.

The class for a node storing the int data type could therefore be represented as:

class Node {
	 private:
	 	 int   element;
	 	 Node *next_node;
	 public:
	 	 // ...
};

To create an instance of a node and to access the member variables, we need the appropriate member functions:

class Node {
	 private:
	 	 int   element;
	 	 Node *next_node;
	 public:
	 	 // Constructor
	 	 Node( int = 0, Node * = 0 );

	 	 // Accessors
	 	 int   retrieve() const;
	 	 Node *next() const;
};

The default values the element is the zero integer while the default value of the next node pointer is the address 0. The int retrieve() const and Node *next() const return the member variables element and next_node, respectively.

However, this class is simply the node class: it does not define a linked list. A linked list is a data structure which stores a sequence of data in an ordered list of nodes. Consequently, to create a linked list class, we must store the address of the first node. We will call this member variable list_head:

class List {
	 private:
	 	 Node *list_head;

	 public:
	 	 // ...
};

We will say that the list is empty if the member variable list_head is assigned the zero address. This must be done in the constructor:

class List {
	 private:
	 	 Node *list_head;

	 public:
	 	 List();
};

List::List():list_head(0) {
	 // empty constructor
}

In our example, the body of the constructor is empty; however, if it were not empty, the body would be run after the member variables listed after the constructor are initialized. In this case, there is only one initialization: list_head is assigned the address 0.

Before we go into further details, let us look at what a linked list would look like. Suppose we have a linked list with the four entries 42, 95, 70, and 81, in this order. Each time memory is allocated for a particular node or for the linked list itself, it would be allocated an appropriate amount of memory somewhere within main memory. This is demonstrated in Figure 1 which shows the location and values stored in the various instances. The instance of the list class is at address 0xFFFF37A0.

Figure 1. A linked list in memory.

Each node actually occupies eight bytes: the int occupies four bytes while (on a 32-bit computer), the address also occupies four bytes. The memory required for the instance of the List class is only four bytes.

You will note that the instance of the list class (on the right) stores an address, while the node at that address stores the integer 42 and the address of the next node. You can follow the nodes in this manner until you get to a node where the next_node member variable is assigned the address 0x00000000, or 0. This is shown graphically in Figure 2.

Figure 2. A linked list with the pointers shown graphically.

In reality, however, the addresses are all subject whatever memory was allocated by the operating system. The OS could have, just as easily, placed the first node at address 0x00572538 or address 0x01B9FCA4. Consequently, the actual addresses are arbitrary, and it would be easier to understand the linked list by abstracting these addresses out. This is shown in Figure 3.

Figure 3. A linked list without the explicit addresses.

Consequently, the most natural means of showing such a linked list is the more conventional image shown in Figure 4.

Figure 4. A conventional representation of a linked list.

Accessor Operations on Linked Lists

Some questions which we may have on linked lists include:

  • Is the list empty?
  • How many elements are in the linked list (what is the size)?
  • What is the first element (the front) of the linked list?
  • What is the address of the first node (the head) of the list?
  • Is a particular integer in this linked list?

Each of these are queries and therefore should, under normal conditions, not modify the linked list. We could answer all of these questions with an appropriate member function:

class List {
	 private:
	 	 Node *list_head;

	 public:
	 	 List();

	 	 bool empty() const;
	 	 int size() const;
	 	 int front() const;
	 	 Node *head() const;
	 	 bool member( int ) const;
};

We will implement each of these member functions, though not necessarily in the given order.

Node *head() const

We will begin by defining the accessor of the member variable list_head. We will use this member function many more times in this class.

Node *List::head() const {
	 return list_head;
}

bool empty() const

Next, to determine if the list is empty, we said before that the linked list is empty if the member variable is assigned 0.

would check t
bool List::empty() const {
	 return (head() == 0);
}

You will notice that instead of checking list_head == 0, I called head() == 0. There is a good justification for this: I do not intend to modify the member variable list_head, however, if I used list_head == 0, I could make the mistake and implement:

bool List::empty() const {
	 return (list_head = 0);
}

Did you notice the mistake? Assignment was used instead of checking for equality. Consequently, this member function would:

  • Assign the value 0 to list_head, and
  • Return 0 (which is interpreted as false.

In this example, we're lucky: the member function is declared to be const, and consequently, if we tried to compile this, the compiler would report an error. Some programmers insist that whenever a variable is being compared with a literal (an actual number), the literal must be placed on the left. For example:

bool List::empty() const {
	 return (0 == list_head);
}

Then, if the programmer made a mistake, such as

bool List::empty() const {
	 return (0 = list_head);
}

then this would generate a compile-time error even if the member function was not declared to be const.

int front() const

To return the first element, simply call int retrieve() const on the list:

int List::front() const {
	 return head()->retrieve();
}

The member function Node *head() const returns the address of the first node and ->retrieve() returns the element stored at that location.

Unfortunately, this is not enough: what is the obvious error?

If the list was empty, the member function Node *head() const would return the address 0, and to call retrieve on this address would access memory not available to the program: this would result in the operating system shutting down the program.

Instead, we must signal that an exceptional case has occurred if the list is empty: there is no first node.

int List::front() const {
	 if ( empty() ) {
	 	 throw underflow();
	 }

	 return head()->retrieve();
}

Here, we check if the linked list is empty, and if it is, we thrown an underflow() exception. This exception would be defined elsewhere, however, this now allows the user to do.

	 List lst;   // define lst to be of type List

	 try {
	 	 cout << lst.front() << endl;
	 } catch ( underflow e ) {
	 	 cout << "the list is empty..." << endl;
	 }

	 cout << "done..." << endl;

If the function int front() const throws an exception, then the thrown overflow will be caught by the catch block and the string the list is empty... will be printed to the screen. In either case, the statement printing done... will be the next line executed.

int size() const

Because we are not keeping track of how many nodes are in this linked list, the only way to determine the count is to walk through the linked list and count the number of nodes.

To walk through a linked list, let us initialize a variable of type Node * and keep updating it until we have reached the end of the linked list.

int List::size() const {
	 // Initialize the count to 0, and set ptr to be
	 // the address of the first node in the linked list

	 int count = 0;
	 Node *ptr = head();

	 // If ptr != 0 then ptr is storing the address of a node:
	 //     Add it to the count and set ptr to be the
	 //     address of the next node in the linked list.
	 // Otherwise, we are at the end of the linked list.

	 while ( ptr != 0 ) {
	 	 ++count;
	 	 ptr = ptr->next();
	 }

	 return count;
}

Figure 5 shows the steps in calculating the size of a list of four elements.

Figure 5. Calculating the size of the linked list.

Another means of doing this is through the for-loop mechanism. Recall that the following for-loop walks through an array:

	 for ( int i = 0; i < n; ++i ) {
	 	 // do something with array[i]
	 }

Notice the sequence of instructions:

  • Initialize the index to the first location (0),
  • Continue while the index does not exceed range of the array, and
  • Increment the index to go to the next location.

This is identical to what was done in the function int size() const:

  • Initialize the pointer to the first node (head()),
  • Continue while the pointer is pointing to a valid node, and
  • Update the pointer to point to the next node.

Thus, it is common to rewrite int size() const as:

int List::size() const {
	 // Initialize the count to 0

	 int count = 0;

	 // Walk through the linked list, and for each entry
	 // update the count.

	 for ( Node *ptr = head(); ptr != 0; ptr = ptr->next() ) {
	 	 ++count;
	 }

	 return count;
}

bool member( int ) const

Like int size() const, the function bool member( int ) const walks through the linked list and checks whether any of the nodes is storing the value n. If we get through the entire linked list without finding n, simply return false.

bool List::member( int n ) const {
	 // Walk through the linked list, and check if any of
	 // the nodes are storing the value n.
	 //

	 for ( Node *ptr = head(); ptr != 0; ptr = ptr->next() ) {
	 	 // If n is found, return true

	 	 if ( ptr->retrieve() == n ) {
	 	 	 return true;
	 	 }
	 }

	 // We have walked through the entire list and n was
	 // not found; consequently, n is not in the list.

	 return false;
}

Mutating Operations on Linked Lists

As well as accessing information about a linked list, it is of course desirable to modify a linked list. For example, one may wish to:

  • Add a new element onto the front of the linked list,
  • Remove the first element of the linked list, and
  • Remove an arbitrary element from the linked list.

This can be done through the following member functions, none of which are declared to be const:

class List {
	 private:
	 	 Node *list_head;

	 public:
	 	 List();

	 	 bool empty() const;
	 	 int size() const;
	 	 int front() const;
	 	 Node *head() const;
	 	 bool member( int ) const;

	 	 void push_front( int );
	 	 int pop_front();
	 	 bool remove( int );
};

We will implement the first two member functions and discuss the third.

void push_front( int )

This is, in this case, the easiest:

  • We must create a new node storing the value and pointing to (storing the address of) the the current head of the linked list, and
  • We must update the list_head variable to equal the address of this new node.

We must therefore allocate new memory for a Node and pass the value n and the address returned by head(). This is shown in the next code example together with a Figure 6 which demonstrates this graphically.

void List::push_front( int n ) {
	 Node *tmp = new Node( n, head() );
	 list_head = tmp;
}

Figure 6. Adding a new node at the front of the linked list.

You may note that the temporary variable tmp is not necessary. We can simply assign the return value of new to the member variable list_head:

void List::push_front( int n ) {
	 list_head = new Node( n, head() );
}

When this line is evaluated, the right-hand side of the assignment is evaluated first, and therefore the constructor of the new node is called with the argument n and the current value of list_head.

	 list_head = new Node( n, head() );

Only once the constructor finishes, the address of the newly-allocated memory is returned and assigned to list_head.

	 list_head = new Node( n, head() );

int pop_front()

As with int front() const, it is necessary to throw an exception if the list is empty. Also, it is possibly useful to return the value stored in the node being removed. Consequently, the return type will be int.

In removing the front node, it is necessary to assign list_head to be the address of the node following the first node in the linked list, however, in order to return the first element, we must first store it.

int List::pop_front() {
	 if ( empty() ) {
	 	 throw underflow();
	 }

	 int value = front();

	 // incomplete implementation
	 list_head = head()->next();

	 return value;
}

Recall that at this point, there is now no further reference to the original first node in the linked list. Memory was allocated for that node, but we never passed that address to delete to free the memory. We have what is called a memory leak: memory was assigned to any process which calls int pop_front(), but without any reference to it (that is, without some way of accessing the address), it can never again be used. This is shown explicitly in Figure 7 which shows how the first node no longer has any reference to it.

Figure 7. Popping the first node by reassigning the variable list_head.

The incorrect solution to this problem is to delete the node first:

int List::pop_front() {
	 if ( empty() ) {
	 	 throw underflow();
	 }

	 int value = front();

	 // invalid solution
	 delete head();
	 list_head = head()->next();

	 return value;
}

This looks harmless enough, but consider what has happened:

  • The memory allocated for the first node is no longer allocated to this process once delete is called;
  • In the very next statement, we try to access that memory.

Most of the time, you could get away with this; however, one in a million times, there is a possibility that this could fail. Consider the following scenario:

  • We execute the line delete head();.
  • The operating system interrupts the currently running process and gives the processor to another process.
  • That process requests memory and the operating system allocates that memory to the new process.
  • After a while, the processor is returned to the process which had just run delete head(); and it now accesses head()->next(). This accesses the member variable next_node in the node which was now allocated to the other process.
  • The operating system realizes this in its check and terminates the process running int pop_front().

On Unix, this signals a segmentation fault which indicates that the process attempted to access memory to which it did not have ownership. You may view this graphically by looking at Figure 8 which fades any components which are deleted or assigned from a deleted member variable.

Figure 8. Deleting the first node before accessing head()->next().

While one in one million may seem small, suppose there was more code between the call to delete and subsequent accesses to the old node: quickly the probability drops to something closer to one in one hundred thousand or even higher.

The correct solution is to:

  • Store the current list head in a temporary pointer,
  • Update the list head, and
  • Then delete the address stored in the temporary pointer.
int List::pop_front() {
	 if ( empty() ) {
	 	 throw underflow();
	 }

	 int value = front();

	 Node *tmp = head();
	 list_head = head()->next();
	 delete tmp;

	 return value;
}

Figure 9 demonstrates the correct steps for popping the first node off of a linked list.

Figure 9. The correct sequence for popping the first node.

bool remove( int )

To remove a value from a linked list, it is necessary to step through the linked list, find the node, and then delete it. The functionality will be similar to the bool member( int ) const function; however, more work is required:

  • We have a number of special cases (What are they?),
  • We may have to update the value of next_node of the node prior to the one being removed, and
  • We must appropriately de-allocate memory.

If the list does not contain the argument, one can simply return false.

int List::remove( int n ) {
	 if ( !member( n ) ) {
	 	 return false;
	 }

	 // We are now guaranteed that n appears in the linked list
}

Now we have two cases—one special, the other general:

  • The node containing n is at the front of the linked list, and
  • The node containing n is not at the front.

It is always useful to deal with all special cases first.

int List::remove( int n ) {
	 if ( !member( n ) ) {
	 	 return false;
	 }

	 // We are now guaranteed that n appears in the linked list

	 // Special case 1
	 if ( ... ) {
	 	 // ...
	 	 return true;
	 }

	 // Special case 2
	 if ( ... ) {
	 	 // ...
	 	 return true;
	 }

	 // Having dealt with all special cases, we
	 // now deal with the general case where the
	 // following conditions hold...
}

One observation you may make is that, in the general case, the bool remove( int ) function must ultimately update the member variable next_node of the node prior to the node being removed. Unfortunately, the member variables of Node are private and consequently, the List::remove function cannot modify them. To allow such a modification to occur, it is necessary for the List class to be declared a friend of the Node class in the latter:

// Must declare List to be a class
class List;

class Node {
	 private:
	 	 int   element;
	 	 Node *next_node;
	 public:
	 	 // Constructor
	 	 Node( int = 0, Node * = 0 );

	 	 // Accessors
	 	 int   retrieve() const;
	 	 Node *next() const;

	 // Declare that List may have access to the private member variables of Node
	 friend class List;
};

The Destructor

Suppose that an instance of the List class is deleted while it is still not empty, for example:

int main() {
	 List *lst = new List();

	 lst->push_front(3);
	 lst->push_front(4);
	 lst->push_front(5);
	 lst->push_front(6);

	 delete lst;

	 // what happened to the memory allocated
	 // for the nodes storing 3, 4, 5, and 6?
}

By default, all delete does is de-allocate the memory associated with, in this case, the List class. In this example, that equals four bytes—the memory occupied by the list_head member variable. This is shown in Figure 10 which demonstrates the memory for the instance of the class List being deallocated.

Figure 10. Deleting lst without deleting the nodes.

In this case, however, it is necessary to delete the rest of the nodes first, but this cannot be left to the memory of the programmer. Indeed, in some cases, the programmer may not even be able to de-allocate the necessary memory. Instead, each time the delete operator is called on an instance of a class, the destructor is called before the memory is deallocated.

The signature of the destructor is always ~ClassName(). You can think of this as not the constructor. (The tilde ~ is the bit-wise complement operator, or not.)

class List {
	 private:
	 	 Node *list_head;

	 public:
	 	 List();
	 	 ~List();

	 	 bool empty() const;
	 	 int size() const;
	 	 int front() const;
	 	 Node *head() const;
	 	 bool member( int ) const;

	 	 void push_front( int );
	 	 int pop_front();
	 	 bool remove( int );
};

In this case, the destructor must go through the list and delete all the nodes:

List::~List() {
	 Node *ptr = head();

	 while ( ptr != 0 ) {
	 	 Node *tmp = ptr;
	 	 ptr = ptr->next();
	 	 delete tmp;
	 }
}

You will note that we walk through the linked list, and while the current pointer is pointing to a node, we:

  • Store the address of the current node,
  • Update the current node to point to the next node, and
  • Delete the stored node.

One iteration is demonstrated in Figure 11. You may note that after the first iteration, list_head is no longer pointing to a valid memory location; however once the destructor exits, the memory for list_head will also be deallocated and consequently, it should no longer be accessed, either.

Figure 12. Deleting lst by iteratively deleting the nodes.

If one wanted to be certain that list_head would no longer be accessed, one could add the line:

List::~List() {
	 Node *ptr = head();
	 list_head = 0;          // ensure list_head is no longer accessed

	 // ...

While this is the most efficient means of performing this operation, there is an easier way:

List::~List() {
	 while ( !empty() ) {
	 	 pop_front();
	 }
}

That is, while this list is not empty, call int pop_front(). Assuming bool empty() const and int pop_front() are implemented correctly, this should delete all of the undeleted nodes in the linked list.

This is demonstrated in Figure 12. Shaded items indicate those which have been deleted.

Figure 12. Deleting lst using int pop_front().

This implementation of ~List() requires, of course, that int pop_front() works correctly. An error most students will make is that they will implement all functions before they test any functions and consequently, no functions can be trusted to be correct when an error occurs. When you implement a member function: TEST IT THOROUGHLY.

Copy Constructor and the Assignment Operator

To be expanded, but consider the following:

// A function where the argument is pass-by-value
void f( List arg ) {
	 // do something
	 cout << arg.head() << endl;
}

int main() {
	 List lst1;
	 lst1.push_front(3);
	 lst1.push_front(4);
	 lst1.push_front(5);
	 lst1.push_front(6);

	 List lst2 = lst1;   // assignment
	 cout << lst1.head() << endl;
	 cout << lst2.head() << endl;
	 f( lst1 );          // pass-by-value

	 return 0;
}

How do we assignment with linked lists? Or, if we pass a list by value as an argument to a function, how do we make a copy of the linked list? If we simply copied the value of the member variable list_head, we would be in trouble, as now multiple lists point to the same node. In the above example, all three cout statements would print the same address.

The problem is: what happens to the other two instances if one of these is mutated?

	 cout << lst1.head() << " == " << lst2.head() << endl;
	 List lst2 = lst1;   // assignment
	 lst1.pop_front();
	 cout << lst1.head() << " != " << lst2.head() << endl;

This the call to int pop_front() deleted the first node of lst1; however, lst2 still tracks the address of that node in the member variable list_head. Any access to list head (via, for example, lst2.front() would access deleted memory.

To solve this, we must override two functions:

  • The copy constructor List::List( const List & ), and
  • The assignment operator List &List::operator=( const List & ).

Fortunately, as before, we can implement one in terms of the other:

List::List( const List &list ):list_head(0) {
	 // We have initialized this new list (setting list_head to 0)
	 // Now we just assign the list being copied to this list.
	 // This calls operator=

	 *this = list;
}

Here, we initialize the current linked list and then simply assign the list we are to copy to *this. (Recall that this is a special member variable which stores the address of this object.

Now, all we must do is implement the assignment operator:

List &List::operator=( const List &rhs ) {
	 // We must avoid situations where a linked
	 // list is assigned to itself.
	 if ( &rhs == this ) {
	 	 return *this;
	 }

	 // We want to make a copy of the argument rhs so we
	 // must empty the current list

	 while ( !empty() ) {
	 	 pop_front();
	 }

	 // Now, we must copy all the elements in the
	 // second list 'rhs' over to this list

	 // Deal with the special case first:  the rhs is empty
	 if ( rhs.empty() ) {
	 	 return *this;
	 }

	 // Deal with the general cases where rhs.size() >= 1

	 // Push the first entry of rhs onto this linked list
	 push_front( rhs.front() );

	 // Walking through the remaining entries of rhs, set 
	 // the next node of 'left' to a new node containing the
	 // object stored in the node pointed to by 'right'.
	 // Then move both left and right forward.
	 for (
	 	 Node *left = head(), *right = rhs.head()->next();
	 	 right != 0;
	 	 left = left->next(), right = right->next()
	 ) {
	 	 left->next_node = new Node( right->retrieve(), 0 );
	 }

	 return *this;
}