C++: The Basics - eps1.4_Datatypes

Last weekend I met a great and charming guy. Well, “met” is a bit exaggerated. It was more like a fleeting encounter. Since then I can’t get him out of my head. But I don’t know if he is the right one for me. What do I do now?

Reads like a post in a dating forum. - But is not! - No, it’s the introductory thoughts when I’m trying to choose a suitable data type.

Data types

When declaring variables, the syntax of C++ requires not only the name, but also the specification of the data type.

What is a data type?

Formally, a data type is defined as a set of values and a set of operations that can be applied to those values.

For example, the set of all integers forms a set of values. However, this set of numbers does not form a data type until it contains a set of operations. These operations are the well-known mathematical and comparative operations. The combination of a set of values and operations results in a true data type.

And why this combination? Actually quite obvious. Not all existing operations are suitable or applicable for every type of data. For example, it doesn’t make sense to divide two names. To prevent you from applying inappropriate operations to certain data, the C++ programming language allows you to select only a limited number of data types.

In C++ you can either create your own data types as a class or you can use the built-in data types. The built-in data types are an integral part of the programming language and are also called primitive types. listing1 shows examples for the definition of one variable with an integer and one with a real number.

int integerNumber = 7;
float realNumber = 7.0;

// listing1: definition of integer and floating point variables

But the combinations of values and operations are not the only properties of a data type. The range of values as well as the size of the reserved space in the memory of a variable are also specified.

In total, there are four basic types of primitive data types:

Truth values (Boolean)
Characters (Characters)
Integers
Floating point numbers (Floating Point)

A large number of data types in C++

Before we look at individual data types in more detail, I would like to introduce you briefly to the datatypes modifiers:

signed
unsigned
short
long

The name already lets you guess, you can modify the built-in datatypes with them. Namely, with their use the value range or also additionally the size of the data types can be adapted.

The modifiers signed or unsigned determine whether only positive unsigned values or also negative signed values belong to the value range. This only shifts the value range and the size remains the same.

To change the size, simply use the modifiers short or long. Here, too, the name says it all and the value ranges become either smaller or larger.

You will see how to use the modifiers correctly in a moment. Now we finally come to the primitive data types in C++.

Boolean

The Boolean variable is named after the English mathematician George Boole and has as a special feature only two values that it can take: ‘True’ or ‘False’, High or Low, 0 or 1. This special form with only two states is the basis of formal logic and today’s computer technology. Boolean are referred to as logical values and have only the logical operators (AND, OR, NOT, EITHER-OR) of Boolean algebra. The keyword for this data type is bool.

Boolean	smallest value	largest value	space
bool	0	1	1 byte
bool	false	true	1 byte

Character

In the character variable you can store single characters. With the keyword ‘char’ you can reserve one byte and store ASCII coded characters with 256 values.

Character	smallest value	largest value	storage location
signed char	-128	127	1 bytes
char	-128	127	1 bytes
unsigned char	0	255	1 bytes

Integer

The keyword int denotes an integer variable. Without a modifier, you have 4 bytes available and can store a maximum of the integer 2147483647 or a minimum of -2147483648 there. In the following table you can see which influence the modifier has on integer variables and which value ranges can be covered with it.

Integer	smallest value	largest value	memory
short int	-32,767	32,768	2 bytes
unsigned short int	0	65,535	2 bytes
int	-2,147,483,647	2,147,483,648	4 Bytes
unsigned int	0	4,294,967,295	4 bytes
long int	-2,147,483,647	2,147,483,648	4 bytes
unsigned long int	0	4,294,967,295	4 bytes
long long int	-9,223,372,036,854,775,807	9,223,372,036,854,775,808	8 bytes
unsigned long long int	0	18,446,744,073,709,551,615	8 bytes

Floating Point

However, if you want to store real numbers with decimal places, you need a floating point variable. With the keyword float you can assign decimal numbers with single precision to a variable with 4 bytes.

If the single precision is not enough for you, you have the possibility to declare a double floating point variable with the keyword double. This increases the precision from 6 (6.92) decimal places to 15 (15.95) and the memory requirement to 8 bytes.

Floating Point	Accuracy	Lowest Value	Largest Value	Memory Space
float	6 digits	-3.4*10^38	+3.4*10^38	4 bytes
double	15 digits	-1.7*10^308	+1.7*10^308	8 bytes

Since C++11 there is also the datatype ‘long double’. This increases the accuracy of a floating point number, but is not quite comparable with its relatives, because the processor cannot calculate directly with this type. For this reason it is also treated differently by different compilers.

Valueless

The keyword void denotes the special data type Valueless. This means an entity without any value. Now you are rightly asking yourself why a data type without a value is needed. I just want to briefly anticipate the topic of functions. Void is used as the return value of functions that do not return a value. If this explanation seems a bit strange to you and you want to understand it in more detail, you will unfortunately have to wait until we look at functions.

Wide Character

Just now we talked about the data type Character, with which you can process ASCII encoded characters. However, the ASCII character set is limited to the English keyboard only and does not include, for example, the German umlauts ä, ö and ü. To be able to encode the most important characters of all scripts and character systems in the world, the Unicode character set was introduced.

Since Unicode includes many more characters than can fit into one byte, C++ offers the Wide Character data type for this purpose. Variables with the keyword wchar_t are very similar to a char, but with 4 bytes they can cover a much larger character set. Thus, Wide Characters are used in connection with international languages, no matter if German, Swedish or Japanese.

C++ Data Type	Space
char	1 bytes
short wchar_t	2 bytes
wchar_t	4 bytes
long wchar_t	8 bytes

It’s the size that counts

And why is there this amount of data types? The reason is historical and was due to scarce resources. Today’s performance monsters cannot be compared with the first computers. Memory and computing power had to be managed very carefully in order not to make the weak machines even slower than they already were.

Now, of course, you can say that optimizing source code to the last byte is no longer necessary today, thanks to rapid development and Moore’s Law. But we are talking in the context of embedded systems. There, the challenge is to get as much as possible out of even the smallest hardware. Besides, it feels good and is a sign of one’s commitment to quality to develop one’s code efficiently.

So don’t be afraid to ask yourself if the integer value needs 16 bits or if 8 bits are enough.

Of course you don’t have to have a table with all data types at hand or learn it by heart. You can easily determine the size of a variable with the function sizeof(). The size is the space in memory that the compiler reserves for declared variables to allocate data to. sizeof() gives you the memory space in bytes.

The function is very easy to use. You just have to pass the data type identifier as argument to sizeof() (see Listing2).

// listing2: Size of primitive datatypes

#include <iostream>

int main()
{
    std::cout << "Size of bool = "  << sizeof(bool)  << " Bytes" << std::endl;
    std::cout << "Size of char = "  << sizeof(char)  << " Bytes" << std::endl;
    std::cout << "Size of int = "   << sizeof(int)   << " Bytes" << std::endl;
    std::cout << "Size of float = " << sizeof(float) << " Bytes" << std::endl;

    return 0;
}

Typing

The assignment of an object to a data type is called typing in computer science. As it should be different, this topic is not treated in one sentence. It offers much material for discussion, since typed programming languages have different type systems. Basically, these serve mainly to limit the value range of variables in a meaningful way and to avoid syntactically or semantically incorrect operations. Therefore one speaks of type-safe, if it concerns the ability of a programming language to prevent type errors during the run time.

An example of a type system is the typing of

C++:

static (by default; optionally dynamic)

explicit

strongly typed

Dynamic or Static

With static type checking, you must already know your types at the time of development and either specify them explicitly, or have the system derive them implicitly. If type checking takes place only during runtime, then it is dynamic type checking.

    int year = 2019;                            // int
    std::string firstName = "embeddingchris";   // std::string

// listing3: Static type in C++

Explicit or Implicit

When specifying the types explicitly, you are encouraged to also assign data with matching types to a variable. In Listing5, it is the integer 2019 that is assigned to the integer variable year. The situation is different for an implicit derivation. Here, the system assumes the appropriate data type for the content.

    int year = 2019;                            // 2019 is an integer
    std::string firstName = "embeddingchris";   // "embeddingchris" is a string

// listing4: Explicit declaration in C++

Strong or weak

You can also understand whether a type system is strong or weak as the degree of type safety. Strongly typed programming languages pay very strict attention to data type compliance. If at all, you are only allowed implicit type conversions without data loss. This means, for example, that you may assign an integer data type with a smaller value range (short) to one with a larger value range (long). Weaker typed programming languages are not so strict about this and rather turn a blind eye.

    int length = 2.54;  // compiling error
    size = (int)2.54;   // int size = 2

// listing5: Strongly typed in C++

What sticks

We now know that a data type is the combination of a set of values and a set of operations. You can create datatypes yourself or fall back on the primitive datatypes of the programming language. Of course, you can also take datatypes from other developers. Perhaps the most popular example is string from the C++ standard library.

The primitive data types of C++ can be subdivided into truth values, characters, integers or floating point numbers. Each data type has its own values, a value range and operators. With the modifiers signed, unsigned, short, long the value range can be better adapted to your application.

And if you just don’t know the size of a datatype, you can help yourself with the function sizeof().

C++ is a static, explicit and strongly typed programming language and requires a well thought-out use of types. But with this knowledge you are very well equipped and can choose the right data type in every situation.

I wish you maximum success!

Sources

[1] B. Stroustrup, A Tour of C++. Pearson Education, 2. Auflage, 29. Juni 2018.
[2] B. Stroustrup, Programming: Principles and Practice Using C++. Addison Wesley, 2. Auflage, 15. Mai 2014.
[3] U. Breymann, Der C++ Programmierer. C++ lernen – professionell anwenden – Lösungen nutzen. Aktuell zu C++17. München: Carl Hanser Verlag GmbH & Co. KG; 5. Auflage, 6. November 2017