Variables & Types
In a computer, data is stored inside a memory location. We can think of computer memory as a large grid of small squares with unique addresses. These small squares, or cells, are arranged in a logical order. Each cell can store 1 byte, or 8 bits of data. A bit is either a 0 or a 1. With 8 bits, we have 256 different combinations () Those combinations can be used to represent the numbers 0 to 255. In statically typed languages, we must explicitly state variable's type during declaration. This tells the compiler what type of data the variable will store. Given that information, the compiler can determine how much memory to allocate. To use that data, we must reference that particular location. In languages like Assembly, this is in fact how we access that data. Now, these memory locations are fairly cryptic and difficult to read (they're in hexadecimal). Instead of referencing these memory locations, we can associate a name to that particular location, and reference that name instead. This is called a binding.
Variables are essentialy "cabinets" where we can place data. Those
cabinets, however, have specific labels called types. The data we place
in a cabinet must have a type that matches the label. However, we can
change the contents of the cabinets. More formally, variables are locations
in memory where we can store data. Instead of referring to that location
with its memory address, we give that location a name called an
identifier. In source code, the identifier is the name we give a
variable. Identifiers should be descriptive and meaningful, so as to make
later recall easier. Identifiers can only consist of (a) upper-case
alphabets, (b) lower-case alphabets (c) numbers, and (d) underscores.
Additionally, the first character in the identifier must be either an
alphabet or an underscore. Thus, identifiers cannot contain special
characters (&
, !
, #
, ;
, etc.) nor can it contain whitespace.
Upper-case alphabets and lower-case alphabets are distinct. Thus, the
identifier foo
is different from the identifier Foo
. A variable is
a programming construct that allows us to associate a value with a name.
More formally, a variable is a named piece of memory that a programmer uses
to store specific types of data. In other words, it is an abstraction for
the process described previously. Examples of variables in C++:
int x; // value is undefined
char grade = 'A';
double pi = 3.14;
The value 'A'
and 3.14
are called literals. A literal is a value
directly represented inside the program executable (as opposed to a value
that must be accessed through a variable stored somewhere in memory). For
example, the variable pi
will be stored somewhere in memory. The literal
3.14
, however, will be directly represented as binary in the executable.
For example, suppose we wrote the following:
A variable declaration is an instruction to the compiler to reserve space in memory for data. Because C++ is a statically typed language, variable declarations must explicitly include the variable's type. A variable initialization is an instruction to the compiler to store data in the reserved space.
int a = 1
int b = 2
When we actually compile the program, the code above looks something like:
value | code |
---|---|
001 | a = 1 (int) |
002 | b = 2 (int) |
0040 | 1 |
0041 | 2 |
The values are called literals because they are literally inside the
source code. They do not need to be represented in memory. Because the
values bound to a
and b
are literals, we can use a
and b
throughout
our program (this is true because the variables above are presumed to be
declared in the global scope; more on this later).
Note that literals with modifiers, like long
and unsigned
, should have
a suffix appended at the end, indicating that the value is a literal:
unsigned int ceoBonus {2'000'000u}; // evaluates to 2989 int myOctal {0777u}
unsigned int errorCode = 0xBAD; // evaluates to 511
float myFloat {1.2f}
Notice the use of single quotes as separators. This is particularly helpful
when we have large value literals. Notice also that we can represent hex
numbers in C++.1 Hex numbers are also used to indicate colors.
Above, we also see an example of using octal numbers as literals. Notice
that octal numbers are prefaced with a 0
. Because of this use of 0
, be
very careful not to preface int
values with a 0
if an int
is
intended.
In the last line, we see a float
literal. Notice the suffix f
. This
suffix must be included if we want to ensure the value is treated as a
float
literal. If we fail to include the suffix, we risk C++ treating the
value as a double
.
Data Types
C++ has three kinds of data types:2 (1) primitive types; (2)
derived types; and (3) user-defined types. The primitive types are types
provided by default in C++. They include: int
, float
, double
, char
,
boolean
, and void
. The derived types are types derived from the
primitive data types. They include: Function
, Array
, Pointer
, and
Reference
. The user-defined types are types defined by the user. They
include: Structure
, Union
, Enum
, Class
, and Typedef
.
A variable of type float
, as well as a variable of type double
, stores
floating point values (i.e., numbers with a decimal point). double
is
allocated 8 bytes of memory. Thus, it can store values ranging from
to How precise floating-point value is depends
on how many numbers follow the decimal point. A float
can store up to 7
digits.
A variable of type int
takes up 4 bytes of memory. It can store any value
from to If we store a floating point value
(e.g., ) in an int
variable, C++ will truncate the value to 2. A
double
can store up to 15 digits A variable of type char
stores a
character from the ASCII table. In reality, char
represents the
character's numeric ASCII value. char
is allocated 1 byte of memory. It
can store any Unicode value from to
A variable of type bool
stores one of two values — true
or
false
. A variable of type bool
is allocated 1 byte of memory. The
void
data type represents "no value." This is a specific instruction to
the compiler not to allocate any memory.
Datatype Modifiers
Data type modifiers are special symbols we can prepend to a data type to
modify the data type's instructions. The modifier long
will instruct the
compiler to allocate 4 more bytes. We can use long
with int
and
double
. The modifier short
instructs the compiler to only allocate 2
bytes. short
applies only to int
. We cannot use modifiers with the
float
type. The unsigned
modifier allows us to store only positive
values (i.e., values without the signed bit). The signed
modifier allows
us to store both positive and negative values. signed
is the default
declaration for int
and char
.
Type Casting
Type-casting is when we instruct the compiler to convert a value of one
type to another. There are two types of type-casting: implicit casting
and explicit casting
. In implicit-casting, the compiler automatically
converts one data type to another. In explicit-casting, we, the
programmers, explicitly instruct the compiler to convert.
The value of x
above is undefined
, which is itself a value. It will
display to the console some value, but it could be any value. All of the
types above are primitive types in C++
. There are several others:
Type | Size | Description |
---|---|---|
char | byte, bits | character |
char16_t | bits | |
char32_t | bits | |
wchar_t | largest available character set | |
short | bits | signed integer values |
short | bits | |
int | bits | |
long | bits | |
long long | bits | |
unsigned short | bits | unsigned integer values |
unsigned | bits | |
unsigned long | bits | |
unsigned long long | bits | |
float | decimal places | non-integer reals |
double | decimal places | |
long double | decimal places | |
bool | usually bits | true; false |
void | represents "typelessness" | |
auto | used to deduce types |
What data type we use depends on the program we're writing and whether we're particularly concerned about memory use. If we attempt to assign too large of a value into a variable type that does not accomodate for that value's size, we will get an overflow.
Constants
Constants are variables whose bound values are immutable — the value cannot be changed. We can initialize constants in C++ as such:
const double accelerationGravity { 9.807 }; // meters per second^2
A constant is a variable whose constants cannot be changed. We can think of it as a locked cabinet. Because they are locked cabinets, constants must be initialized and declared at the same time.
The const
qualifier does not actually apply to the data in memory. In
other words, it's not like the data in memory permanently stays in some
location, never moving, never changing. Instead, the qualifier applies to
the variable name. What this means is, in the variable above,
accelerationGravity
, we cannot change the value bound to it, 9.807
,
using accelerationGravity
. This is a crucial distinction to understand,
because there is more than one way to modify a value bound to a variable.
As we will see on the section with pointers, there are situations where we
can accidentally cause a const
to change.
Question: What if the const
value depends on the computation of a
non-constant value? Will subsequent changes to the non-constant value cause
changes to the const
value? Fortunately in C++, no. Once a variable is
initialized as a constant, that value will remain bound to the identifier:
#include <iostream>
int main() {
int a = 3;
const int b = 4 + a;
std::cout << b << std::endl;
a += 1;
std::cout << b << std::endl;
return 0;
}
7
7
Constants are what allow us to ensure that a value bound to a particular identifier is never changed through that identifier. If we, or any part of our code, attempt to modify that constant through the identifier, we will get a compiler error.
In general, we should err on the side of using constants. While mutability
is a useful tool, erring on mutability only increases the amount of values
we must keep track of. Some of those values will be things that should
never change, and should not be included in our personal list of tracked
values. Some C++ programmers follow the habit of declaring every variable
as a const
, then deciding after seeing compiler errors whether to remove
the const
declaration. This is great habit, as it forces us to more
carefully consider whether we want mutability.
Constant Expressions
Constant expressions are expressions that are evaluated at compile time
rather than runtime. The default behavior is to perform these computations
at runtime. The problem, however, is that some computations can be
enormous. This means that whenever a user loads an executable (i.e., runs
the program,) the computation may need to be performed again. This
redundancy can amount to large costs in runtime. To declare constant
expressions, we use the constexpr
keyword:
constexpr double pi {3.14};
We cannot use non-constant expressions (i.e., runtime values) inside constant expressions (compile-time values). This is because the compiler cannot perform computations on or with runtime values. Those computations must be done at runtime.
const double e {2.718};
constexpr double myNum { e * 2.2 }; // returns an error
The code above returns an error because e
is a non-constant expression.
To actually perform the computation, e
must also be a constant
expression:
constexpr double e {2.718};
constexpr double myNum { e * 2.2 }; // returns 5.9796
Constant expressions are enormously useful in ensuring that we do not waste valuable runtime. In particular, we almost always want to perform large, heavy computations at compile time. By doing so, we do not have to waste valuable runtime performing the large computation. Instead, the computer's loader will simply retrieve the result from the compiler's computation.
We can think of constant expressions as a cost-shifting mechanism. Rather than the user paying the price for the large computation, we, as the programmers, pay the price at computation. Erring on constant expressions, executables can load much, much faster.3
Constant expressions are very useful for checks For example, suppose our
program should not proceed any further — in compilation —
unless some condition is met. We can do so with static_assert(${b}$)
,
where is the condition we want met.
int main() {
constexpr int a = 2;
static_assert(a == 3);
return 0;
}
main.cpp:5:2: error: static_assert failed due to requirement 'a == 3'
static_assert(a == 3);
^ ~~~~~~
1 error generated.
Above, our program never actually compiles — and never gets to
runtime — because the condition a == 3
is not met. Separately,
constant expressions are effectively constants, so the expressions cannot
be mutated. static_assert()
can be a very valuable tool when we are
moving large amounts of computations to compile time, as it reduces the
costs we've shifted to ourselves as programmers.
The Size Of Operator
The sizeof()
operator returns the size, in bytes, of a type or variable.
The operator is often used to determine size of arrays, structures, and
objects. We address it here because it is particularly useful for
variables, and will revisit it when we discuss arrays.
C++
provides several ways to declare and initialize variables:
int foo; // declaration; likely contains garbage value
int baz = 17; // assignment initialization
int kwa (43); // constructor initialization; initializes to 43
int xyz {13}; // C++ list initialization; initializes to 13
int bar {}; // initializes to 0
// we can also uses expressions as initializers
int bop {14 + 2}; // initializes to 16
int x {2};
int y {5};
int z {x + y}; // initializes to 7
int g {x + y + i}; // won't compile, i is undeclared
Note that if we declare a variable but do not initialize it, that variable will have some garbage value stored, and we have no control over what that value is. As such, it is crucial that we initialize variables once declared.
This is a stark contrast to other languages like Java, where there are default values for uninitialized variables. If you're coming from one of those languages, beware!
More on Numeric Types
Numeric types are so important in programming that it's worth exploring
them more deeply. To begin, C++
has no problem handling positive and
negative numbers:
int val1 {1} // a positive integer
int val2 {-1} // a negative integer
The code above is a form of syntactic sugar for the following:
signed int val1 {1}
signed int val2 {-1}
The keyword signed
is called a modifier. A modifier does what it says
— it modifies what a variable can and cannot do. In this case,
signed
indicates that the subsequent variable can store positive or
negative numbers. If we want to ensure that a variable can only store
positive numbers, we use the unsigned
modifier. Whether or not an int
is signed or unsigned, the int
will still occupy 4 bytes in memory.
Signage, however, impacts how many values a given numeric type can
represent. Roughly, where is the number of bits for a type in memory,
unsigned numeric types can represent numbers in the interval
Signed integers can store numbers in the interval
For example, with integers, the unsigned
modifier allows us to represent numbers from 0 to about 4.2 billion. With
signed
, we can represent numbers from about -2.1 billion to 2.1 billion.
There are additional modifiers we can use for numeric types, short
and
long
. The short
and long
modifiers will modify the number of bytes a
type requires. With short
, we reduce the range of values we can
represent, but save memory. With long
, we increase the range of values
can represent, but use up more memory. We can use short
and long
with
signed
and unsigned
to balance the tradeoffs:
// these variables take up 2 bytes
short foo {};
short int bar {};
signed short bang {};
signed short int zing {};
unsigned short int blob {};
// these variables take up 4 bytes
int vroom {};
signed hoho {};
signed int gaga {};
unsigned int didi {};
// these variables take up 4 or 8 bytes
long wawa {};
long int pepe {};
signed long toto {};
signed long int shibi {};
unsigned long int grob {};
// these variables take up 8 bytes
long long jam {};
long long int jip {};
signed long long doop {};
signed long long int coco {};
unsigned long long lulu {};
unsigned long long int momo {};
Note that the modifiers signed
, unsigned
, long
, and short
only work
for integral types — types representing integers. They will not
work for floating types — types representing fractional numbers.
Furthermore, we should be careful when using the short
and unsigned
modifiers with integral types like short int
. Arithmetic may not work as
expected for an integral type that can take no more than 4 bytes of memory
(e.g., short int
, which only takes 2 bytes). If the result requires more
than 4 bytes of memory, C++ will return a result in type other than the
operands'. For example, in the case of short int
, if the result requires
more than 2 bytes of memory, C++ will return a result of type int
.
Floats
C++ has three base floating types: float
, double
, and long double
.
The base type float
takes up 4 bytes of memory, double
takes up 8
bytes, and long double
takes up 12 bytes. With floating types, the
primary concern for memory usage is more focused on precision. With 4 bytes
of memory, precision is ensured for up to 7 decimal places (including the
integral part). For double
, it's 15. With long double
, it depends on
the compiler implementation; it can range anywhere from 15, 18, to 33. For
the most part, double
is the recommended type to use, and it suffices for
most programming tasks.
Scientific Notation
C++ supports scientific notation natively:
// x and y are equivalent
double x { 732400023 };
double y { 7.32400023e8 };
// v and w are equivalent
double v { 0.00000000002313 };
double w { 2.313e-11 };
Infinity and NaN
For those familiar with JavaScript, C++ has representations for Infinity
and NaN
:
#include <iostream>
int main() { float a { 1.0 }; float b {}; float c { a / b }; // evaluates to Infinity
float u { 0.0 };
float v { 0.0 };
float w { u / v };; // evaluates to NaN
std::cout << "valueOf c: " << c << std::endl;
std::cout << "valueOf w: " << w << std::endl;
return 0;
}
valueOf c: inf
valueOf w: nan
Division
The arithmetic operations covered are fairly straightforward. Special attention, however, should be given to division.4
Dividing integers can yield unusual results:
#include <iostream>
int main() {
int c = 21 / 10; // expected: 2.1
std::cout << c << std::endl;
}
2
Notice that the output is 2. Why is this? What's actually happening is that
C++ is trying to figure out how many times it can fit 10 inside 21. In this
case, 2. This is what happens when we divide an int
value by a
non-multiple of that value. We won't get back a fractional number. For
that, we need to use float types:
#include <iostream>
int main() {
double c = 21.0 / 10.0; // expected: 2.1
std::cout << c << std::endl;
}
2.1
Booleans
Boolean data types represent just two states — true or false.
Accordingly the data type bool
in C++ consists of only two values: true
and false
. A Boolean value takes up 8 bits of memory (an entire byte).
This may seem extremely wasteful, particularly from the perspective of
embedded systems, where memory is very scarce. With advances in memory
capacity, however, most programs aren't impacted by the consumption. We
will see in later sections various ways of bundling more data into a byte.
Characters
In C++, single-character data is represented with char
type. A char
value is represented with single quotes in C++:
char foo = 'a';
Note that we must use single quotes. Any other symbol will return an error.
The char
type takes up 1 byte, or 8 bits, of memory. Characters in C++
map to the ASCII table. Thus, each ASCII character has an integer ASCII
encoding.
The Keyword auto
The keyword auto
is C++'s take on type inference. By appending the
keyword to a variable name, we instruct C++ to infer the bound value's
type. This is particulary useful for when we have particularly long type
names. We can't really appreciate how useful type inference is at the
moment, but we mention it here for completeness.
auto val1 {19}; // type inferred: int
auto val2 {38.3}; // type inferred: double
auto val3 {'e'}; // type inferred: char
auto val4 {2.0f}; // type inferred: float
auto val5 {290.3l}; // type inferred: long
auto val6 {923u}; // type inferred: unsigned
auto val7 {197ul}; // type inferred: unsigned long
auto val9 {9291ll}; // type inferred: long long
User Input
In the examples above, we used cout
to output data. cout
has sibling,
cin
, which inputs data:
#include <iostream>
using namespace std;
int main(int argc, char *argv[]) {
int userNumber;
cout << "Pick a number:" << endl;
cin >> userNumber;
cout << "You picked: " << userNumber << endl;
return 0;
}
Pick a number:
27
You picked: 27
Global v. Local
Variables declared inside the main()
function are said to be local
variables:
int zob {1};
int main() {
int fen {25};
int jib {12};
}
In the code above, fen
and jib
are local variables. We say they are
local variables because their scope (visibility) is within the main()
function. Put differently, the variables fen
and gib
only exist inside
the main()
function. Outside of the main()
function, they effectively
do not exist.
In contrast, the variable zob
is a global variable. This means they
are visible to everything in the program. Which in turn means they can be
changed from anywhere and by anything in the program.
Naming
C++
has a few naming rules we must follow: (1) Variables can contain
letters, numbers, and underscores; (2) Variables must begin with a letter
or underscore; (3) Cannot use reserved symbols; and (4) cannot redeclare
a name in the same scope.
Footnotes
-
Hex numbers are used amusingly (and usefully) in many programs as hex speak. For example, the hex number
0xBAAAAAAD
is used as an iOS exception report (it evaluates to the integer3131746989
.) Similarly,0xDEADBEEF
is almost ubiquitously used in embedded systems to indicate a software crash or a deadlock. These are all examples of magic numbers; numbers used to indicate a particular value. ↩ -
Mathematically, a data type is really just a set. It is the set of all the possible values that belong to that type, alongside the operations that can be performed on those values. ↩
-
As evidence of how valuable constant expressions are, one of the most competitive games in the programming languages market is moving as many computations as possible to compile time. With each new standard of C++ (and other languages like Java), there have been increases in the number of movable computations. Given how fast compilers are getting, the game is only growing more heated. ↩
-
As an aside, the
*
(asterisk) operator's use for multiplication dates back to John Backus and his team's development of Fortran. At the time (1954), the available character set for computers was limited, and*
was the closest symbol to ↩