Skip to the content of the web site.

Lesson 1.3: Literals

Previous lesson Next lesson


If you want a primer on regular expressions, please read this document.

Up to this point, we have shown a statement that appears to print text to the console window. We will now describe the text we print to the screen and how that code is different from the rest of the program. The code "Hello world!" in the previous program is described as a string of characters or, in short, a string.

There are two means by which data can be accessed by a program. The most common is for data to be read from another device, but there are occasions when data can be hard coded into the program itself. This is very often the approach we will use in this course, at least until we learn how to read data from the keyboard or to read and write data to files.

When data is explicitly stored in a program, it is referred to as literal data. There are exactly five different categories of literal data:

  1. integers,
  2. characters,
  3. strings,
  4. floating-point numbers, and
  5. Boolean values.

We will discuss each of these together with an introduction to regular expressions for describing the format of literal data.

Literal integers or integer literals

A literal integer is any contiguous sequence of decimal digits that is either 0 or a sequence of two or more digits not starting with a 0. Thus, 0, 57, 1970 are all literal integers, but 057 is not (well, it is, but it is interpreted as if it was in base 8).

A literal integer may be prefixed by either a + or a -.

You have already seen a literal integer in our initial program: the last statement was return 0;.

Using English to describe a literal integer is potentially ambiguous, and so regular expressions are used to describe them, instead. Here we see the regular expressions for zero and non-zero literal integers:

                       [+-]?0     [+-]?[1-9][0-9]*

In order to indicate that both of these are acceptable, we can join them with a pipe | symbol. In programming, the | is often used to represent or, so

                       [+-]?0|[+-]?[1-9][0-9]*

matches either [+-]?0 or the pattern [+-]?[1-9][0-9]*.

This is not C++ code; this is a means of describing various representations of data.

When you see brackets with two or more characters, it represents any one of those characters. For example, [+-] means either a + or a -. The [1-9] represents any digit from 1 to 9, while [0-9] also includes the 0 digit.

If a character or selection of characters is followed by a ?, that means that the previously described expression is optional, while if a character or selection of characters is followed by a *, this represents zero or more contiguous characters such described.

To give some more concrete examples, x* represents no xs, x, xx, xxx, etc. while a?bc? represents abc, bc, b, or ab.

Thus, xo* represents x, xo, xoo, etc. while [Hh]iren represents either Hiren or hiren.

In this case, [+-]?0 represents 0, +0 and -0, while [+-]?[1-9][0-9]* represents all of

1 -1 +1 2 -2 +2 3 -3 +3 4 -4 +4 5 -5 +5 6 -6 +6 7 -7 +7 8 -8 +8 9 -9 +9 
10 -10 +10 20 -20 +20 30 -30 +30 40 -40 +40 50 -50 +50
           60 -60 +60 70 -70 +70 80 -80 +80 90 -90 +90 
11 -11 +11 21 -21 +21 31 -31 +31 41 -41 +41 51 -51 +51
           61 -61 +61 71 -71 +71 81 -81 +81 91 -91 +91 
12 -12 +12 22 -22 +22 32 -32 +32 42 -42 +42 52 -52 +52
           62 -62 +62 72 -72 +72 82 -82 +82 92 -92 +92 
13 -13 +13 23 -23 +23 33 -33 +33 43 -43 +43 53 -53 +53
           63 -63 +63 73 -73 +73 83 -83 +83 93 -93 +93 
14 -14 +14 24 -24 +24 34 -34 +34 44 -44 +44 54 -54 +54
           64 -64 +64 74 -74 +74 84 -84 +84 94 -94 +94 
15 -15 +15 25 -25 +25 35 -35 +35 45 -45 +45 55 -55 +55
           65 -65 +65 75 -75 +75 85 -85 +85 95 -95 +95 
16 -16 +16 26 -26 +26 36 -36 +36 46 -46 +46 56 -56 +56
           66 -66 +66 76 -76 +76 86 -86 +86 96 -96 +96 
17 -17 +17 27 -27 +27 37 -37 +37 47 -47 +47 57 -57 +57
           67 -67 +67 77 -77 +77 87 -87 +87 97 -97 +97 
18 -18 +18 28 -28 +28 38 -38 +38 48 -48 +48 58 -58 +58
           68 -68 +68 78 -78 +78 88 -88 +88 98 -98 +98 
19 -19 +19 29 -29 +29 39 -39 +39 49 -49 +49 59 -59 +59
           69 -69 +69 79 -79 +79 89 -89 +89 99 -99 +99 
100 -100 +100 200 -200 +200 300 -300 +300 400 -400 +400 500 -500 +500
              600 -600 +600 700 -700 +700 800 -800 +800 900 -900 +900 
101 -101 +101 201 -201 +201 301 -301 +301 401 -401 +401 501 -501 +501
              601 -601 +601 701 -701 +701 801 -801 +801 901 -901 +901 

and so on.

Definition: literal integer
A literal integer is interpreted by the compiler as that value. There are some restrictions on how large a literal integer may be, but for the most part, any integer less than one billion will be interpreted as that value.

Literal characters or character literals

If you want to record in a program a single character that appears on the keyboard, in general, you write that character between two single quotes:

' '  '!'  '"'  '#'  '$'  '%'  '&'       '('  ')'  '*'  '+'  ','  '-'  '.'  '/'
'0'  '1'  '2'  '3'  '4'  '5'  '6'  '7'  '8'  '9'  ':'  ';'  '<'  '='  '>'  '?'
'@'  'A'  'B'  'C'  'D'  'E'  'F'  'G'  'H'  'I'  'J'  'K'  'L'  'M'  'N'  'O'
'P'  'Q'  'R'  'S'  'T'  'U'  'V'  'W'  'X'  'Y'  'Z'  '['       ']'  '^'  '_'
'`'  'a'  'b'  'c'  'd'  'e'  'f'  'g'  'h'  'i'  'j'  'k'  'l'  'm'  'n'  'o'
'p'  'q'  'r'  's'  't'  'u'  'v'  'w'  'x'  'y'  'z'  '{'  '|'  '}'  '~

There are two gaps, because the first symbol is a ' and the second is a \. There are other ASCII characters that are non-printing characters, and to represent these, we use an escape character. An escape sequence says that the next character should be interpreted in a particular way.

For example, '\'' has the escape character \ followed by a single quote, which is itself surrounded by single quotes. This represents the character '. The character '\\' represents the backslash, and '\t' represents the Tab character.

Other special characters are '\n' and '\r' which represent a mechanical operation of a teletype machine (an automated typewriter). These are the line feed (i.e., a new line) and carriage return operations.

With teletype machines, at the end of a line, you had to both return the carriage to the start of the line, and move the paper one line forward. When printing text to the console, some operating systems realized that two characters were not necessary to go to the next line, so Unix adopted '\n' while the classic Mac OS chose '\r'. MS DOS and Windows (somewhat inexplicably) chose to keep both.

This introduces a problem that all developers and engineers must deal with: compatibility and portability. To go to the next line, you must print one of three possible characters:

	// In Unix or MacOS
	std::cout << "Hello world!";
	std::cout << '\n';
	// In class Mac OS
	std::cout << "Hello world!";
	std::cout << '\r';
	// In MS DOS or Windows
	std::cout << "Hello world!";
	std::cout << '\r';
	std::cout << '\n';

To avoid this, C++ has an object std::endl that is always interpreted as a new line:

	// In MS DOS or Windows
	std::cout << "Hello world!";
	std::cout << std::endl;

Literal strings or string literals

A string of characters, or just a string is a sequence of characters that are interpreted as such. A literal string in C++ is surrounded by double quotes:

	std::cout << "Hello world!";

Again, as before, special characters need to be escaped, so for example, we have

	std::cout << "She said \"Hello world!\"";
	std::cout << std::endl;
	std::cout << "Look in C:\\Users\\dwharder";
	std::cout << std::endl;
	std::cout << "Times:\t0.1 s\t23.4 s\t56.789 s\t0 s";
	std::cout << std::endl;

Incidentally, inside a string, you need not escape a single quote, so the following is acceptable:

	std::cout << "That individual said 'Hi!'.";
	std::cout << std::endl;

Definition: string
A sequence or string of characters that appear between two double quote characters ("). The double quotes are not part of the string, but rather, they are used by a programmer to show where the string begins, and where the string ends.

	std::cout <<< "Windows uses a backslash (\"\\\") while Linux uses a slash (\"/\").";

Definition: escape character
A character in a literal string meant to indicate that the next character should not be interpreted in the usual way. Because a double quote is used to denote the beginning and end of a literal string, if you want a literal string to contain a double quote, you must use \".

Literal floating-point numbers or floating-point literals

Integers are used for counting, but in general, measurements allow for a continuum of values, which we call real numbers. A number like π has an infinite number of digits, and therefore cannot be represented in a finite number of characters. If, however, we restrict ourselves to a finite number of digits in a real number, we cannot represent all possible real numbers. Later, you will see how real numbers are approximated in computers, but we call that representation a floating-point representation.

Thus, 3.14 is a literal floating-point number in a program.

Floating point numbers are sequences of one or more digits with a single decimal point at some point that sequence of digits, possibly prefixed by a + or a -.

Going back to our regular expressions, we could might consider representing this as

                       [+-]?[0-9]*\.[0-9]*

This means possibly a + or - with zero or more digits followed by a period (representing the decimal point) followed by zero or more additional digits. In regular expressions, the period . represents any character, so if we actually want to match exactly a period, we must escape it: \..

Problem: this representation suggests the following three are also floating-point numbers: ., +. and -., and so we should require at least one digit either before or after the decimal point:

           [+-]?[0-9]+\.[0-9]*         [+-]?[0-9]*\.[0-9]+

where [0-9]+ indicates one ore more digits and [0-9]* means zero or more digits. Again, we can join these with the pipe:

          ([+-]?[0-9]+\.[0-9]*|[+-]?[0-9]*\.[0-9]+)

Thus, a|e|i|o|u and [aeiou] are identical.

Literal Boolean values or Boolean literals

The literals true and false represent true and false with respect to statements of truth and false. The first actually is represented internally by 1 while the second is 0.

	std::cout << true;        // Prints a '1'
	std::cout << std::endl;
	std::cout << false;       // Prints a '0'
	std::cout << std::endl;

As a regular expression, this is (true|false).

Review

In your own words, describe each of these concepts:

Questions and practice:

1. In your code, replace the two lines starting with std::cout with the following nine lines:

	std::cout << "O Elbereth ";
	std::cout << "Gilthoniel,";
	std::cout << std::endl;
	std::cout << "o menel palandiriel,";
	std::cout << std::endl;
	std::cout << "le nallon si di'nguruthos!";
	std::cout << std::endl;
	std::cout << "A trio nin, Fanuilos!";
	std::cout << std::endl;

What is the purpose of the space after "Elbereth"?

What does it appear the line std::cout << std::endl; does to console output?

2. Suppose you were writing an app for your Android phone, and you want to ask the user for his or her name. Why would it be a bad idea to use the literal string "Name: " in your code?

3. What do the following regular expressions describe:

              x
              x*
              x+
              x?
	      [abc]
	      [abc]*
	      [abc]+
	      [abc]?
	      (ab|cd)
	      (ab|cd)*
	      (ab|cd)+
	      (ab|cd)?

4. What do you think the following regular expression describes?

	      [a-zA-Z_][a-zA-Z0-9_]*

Solutions are available. If you do not make a serious effort at answering these questions first, please consider not going the University of Waterloo for your undergraduate studies: you will make neither a good student nor a good engineer.


Previous lesson Next lesson