License : Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0)
Copyright : Hervé Frezza-Buet, CentraleSupelec
Last modified : April 19, 2024 10:22
Link to the source : compile.md

Table of contents

Compiling

Splitting C++ code into several files

Get the files

Download the archive preprocess-002.tar.gz, uncompress it and go into the directory.

mylogin@mymachine:~$ tar zxvf preprocess-002.tar.gz
mylogin@mymachine:~$ rm preprocess-002.tar.gz
mylogin@mymachine:~$ cd preprocess-002
mylogin@mymachine:~$ ls

You have a bunch of files here, organized as follows.

First, we provide “recipes” for handling geometry stuff. The files are named geom*.hpp and geom*.cpp. We will examine them hereafter.

Second, we have used the geometry recipe to write auxiliary functions, that compute the medians of a triangle. These functions are defined in files medianator.hpp and medianator.cpp.

Last, our main program is written in the file main.cpp.

Make binaries

In C++ compiling, the idea is to have one binary file per .cpp file. Headers (i.e .hpp files) are only here to be included in .cpp files in order to enable correct compiling.

Let us start with geomPoint.cpp. Here, we use g++ but the clang compiler behaves the same.

Read the code of geomPoint.cpp, and read the content of the files which are included (do not read the system files like cmath). The order of inclusion makes you read geomPoint.hpp first, since it is copy-pasted at the beginning of geomPoint.cpp via the #include <geomPoint.hpp> directive.

Ok, let us ask g++ to build a binary from it (it will fail… don’t be afraid).

mylogin@mymachine:~$ g++ -std=c++17 geomPoint.cpp

You get an error, telling that the compiler does not find the geomPoint.hpp file… which is there ! The reason is that #include <...>, with brackets, means “search in the standard directories”… and the current directory is not among the standard directories (they are /usr/include, /usr/local/include, …).

So we have to tell the compiler to consider the current directory, i.e. the directory ., as one of the standard directories where included files may be found. This is the meaning of the -I<path> compiler flag. So to solve our issue, we have to call

mylogin@mymachine:~$ g++ -std=c++17 -I. geomPoint.cpp

Compiling succeeds (I know you still have an error). It means that you code is correct, everything mentioned is defined, and so on. The error you get is that the generated binary does not contain any main function. Indeed, our geomPoint.cpp, even after the copy-pasting of the included files by the preprocesing, does not implement any main function.

The reason is that the compiler tries to make an executable binary, and executable binaries, by convention, have to implement a function called main at which the execution thread starts. Here, we only have to tell the compiler that we only write a binary version of our code, but it is ok if it does not implement an executable binary. The -c flags means that.

mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomPoint.cpp

Check the directory, you have now a geomPoint.o file. It is a binary file, unreadable by a human brain, that contains your program translated by the compiler into the language of your micro-processor.

Even if it is not human readable, there are some tools to inspect what is written in binary files. Let us try it with our first geomPoint.o binary file.

You can add extra flags on the command line, as for example asking the compiler to check strictly that your code fits the ISO C++ norm. This is done like this, adding -pedantic and -Wall (meaning “warnings all”).

mylogin@mymachine:~$ g++ -std=c++17 -Wall -pedantic -I. -c geomPoint.cpp
mylogin@mymachine:~$ nm -C geomPoint.o

You get lines such as

000000000000017c T geom::Point::operator=(geom::Point const&)

Consider geomPoint.o as a recipe book, the display you get is its table of contents. First hexadecimal number is the address of the recipe (it is the page number in a real book), second element is a letter (T here), and then you have the recipe name (geom::Point::operator=(geom::Point const&) here), which is the name of the function. Remove the -C flag of the nm command in order to see the real names of your recipes.

The letter T (stands for text) means that the text of the recipe is actually in that book. You may find in the table of content recipes with a letter U (undefined). It means that, in this book, some recipes are mentioning a recipe, but that recipe is not written in that book. Here, this is the case for recipe sqrt that we invoke in geomPoint.cpp, but for which we did not write the code. The header of sqrt is defined in cmath that we include, so the compiling is successful. Remove #include <cmath> from the geomPoint.cpp file, re-compile, and you will see the error… then put it back and recompile.

Now, do the same for geomSegment.cpp (read the code following the inclusions), and be sure to understand what is included, and why the table of contents of the recipe bool geomSegment.o is as it is.

mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomSegment.cpp
mylogin@mymachine:~$ nm -C geomSegment.o

You may notice that the calls of functions related to geomPoint.o have been correclty compiled, thanks to the inclusion of geomPoint.hpp, but that these functions have the letter U in the table of contents. Indeed, they are not defined in that book, geomSegment.o, but in geomPoint.o as we have seen previously.

We do the same for all the .cpp files, take the time to read the code, and understand the output of nm.

mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomTriangle.cpp
mylogin@mymachine:~$ nm -C geomTriangle.o

In the next files, geometry headers are included all to once, thanks to the geom.hpp file that only includes all the geom*.hpp files.

mylogin@mymachine:~$ g++ -std=c++17 -I. -c medianator.cpp
mylogin@mymachine:~$ nm -C medianator.o
mylogin@mymachine:~$ g++ -std=c++17 -I. -c main.cpp
mylogin@mymachine:~$ nm -C main.o

The last one, main.o, contains a main function. So we can gather the books (all the .o files) to make an executable, since all the recipes are there. The flag -o tells the name of the executable (you will get an error… do not be afraid).

mylogin@mymachine:~$ g++ -o test main.o geom*.o medianator.o

You get errors… the gathering process is called “linkage”. It consists in gathering books, and chek that every function mentioned in a book is defined somewhere in the books gathered (i.e. any U recipe is defined as a T recipe somewhere). Moreover, each recipe must be defined… once !

Here, for example, the function geom::operator*(double, geom::Point const&) is defined (with a T) in all the files ! Indeed, we have written the recipe in geomPoint.hpp, which is included by all the recipes… So they all define the function.

A fix could be to write only the headers

Point operator*(double a, const Point& p);
Point operator*(const Point& p, double a);
Point operator/(const Point& p, double a);

in the file geomPoint.hpp, and write the full code (very short) in geomPoint.cpp, as we did for all the other functions/methods. But here, let us keep the code in geomPoint.hpp, and add the inline keyword like this

inline Point operator*(double a, const Point& p) {return {a * p.x, a * p.y};}
inline Point operator*(const Point& p, double a) {return a * p;}
inline Point operator/(const Point& p, double a) {return {p.x / a, p.y / a};}

Recompile everything

mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomPoint.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomSegment.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomTriangle.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c medianator.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c main.cpp

Check all .o with nm, as illustrated here for main.o

mylogin@mymachine:~$ nm -C main.o

And you will see that the inlined functions appear only if they are used in the recipes, and if so, they are defined with a symbol W (weak). It means the multiple definitions in the final gathering are allowed.

Inline functions are more than that. Indeed, the compiler can decide not to implement the recipe, but to re-write it each time, as needed, when a recipe calls the function. In this case, there is no function call at excution time, it is as if the code of the inline function had been copy-pasted each time it was required (like a macro substitution).

This saves time when the compiler does it, and the compiler may do it for short functions as the ones we have inlined.

So now, we can gather the books and execute our executable (it displays nothing).

mylogin@mymachine:~$ g++ -o test main.o geom*.o medianator.o
mylogin@mymachine:~$ ./test

The compiling could have been done directly from all the sources, but it is much better to understand how things can be compiled separately and then gathered by linkage. The global compiling command (not recommended) is the following:

mylogin@mymachine:~$ g++ -o test -std=c++17 -I. *.cpp
mylogin@mymachine:~$ ./test

Now let us suppose that we forget a book in the final gathering. Let us forget medianator.o on purpose. You will get a linking error. Try it !

mylogin@mymachine:~$ g++ -o test main.o geom*.o 

The linking stage (i.e. the software called the linker) complains about an undefined reference to the median_by_A… it is called (U) somewhere (in main.o, it is mentionned in the error message) but no books among the ones we have gathered defines it (no T).

So you may wonder about the sqrt function… that we have not written and for which we do not have taken the book in the gathering. As it is a function of the standard library, the compiler adds that cmath.o book (this is not its real name) automatically… but it has actually been added.

Make libraries

A library is very similar to a .o file, i.e it is a recipe book (written in binary code, not in C++). So building a library is very similar to compiling .cpp files into .o files as we did so far.

The difference with usual .o files is that a library can be shared. Indeed, when an executable is loaded in the memory (the RAM) by the system to be executed, it do not contain the library. The library is loaded apart, in memory as well. So at the end, it is the same as loading everything in memory. Things change when the system loads another executable that needs the same recipes (i.e. the same library). In this case, as the library is already loaded for the previous executable, it is not loaded twice. The system knows that the book is already loaded in the RAM, and the two executables share the text of the recipes.

Ok, so let us consider that the geometry tools are a set of tools that could form a geometry consistent recipe book, and let us build it.

First, we compile everything, adding a specific flag. We first remove previously compiled *.o and test binaries, then we rebuild them.

mylogin@mymachine:~$ rm -f *.o test
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomPoint.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomSegment.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomTriangle.cpp
mylogin@mymachine:~$ ls *.o

Now, we gather all the geom*.o objects into a single recipe called libgeom.so (so means “shared object”, it is called dll on Windows).

mylogin@mymachine:~$ g++ -o libgeom.so -shared geom*.o
mylogin@mymachine:~$ rm geom*.o
mylogin@mymachine:~$ ls

You can also ckeck the content of libgeom.so

mylogin@mymachine:~$ nm -C libgeom.so

And that’s it. Now, we can compile other .o files as previously.

mylogin@mymachine:~$ g++ -c -std=c++17 -I. medianator.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. main.cpp

Let us make an executable from the whole stuff. We only need to gather main.o and medianator.o, since the other recipes in libgeom.so will be gather only when the system will launch our final test executable (we will get an error, do not be afraid).

mylogin@mymachine:~$ g++ -o test main.o medianator.o

You get linkage errors… the system asks you to provide the recipe for the functions which are not written yet in main.o and medianator.o. These functions are used in main.o and medianator.o but defined in the libgeom.so recipe book. Nothing is wrong, since indeed, we will provide the functions when we will launch test for execution… this is why shared libraries are called shared.

So where is the problem ? It is only a problem of checking that everything will be there. So, for checking only, we have to provide libgeom.so to the last command.

mylogin@mymachine:~$ g++ -o test main.o medianator.o libgeom.so

You can check that the geom function are still undefined, i.e. the recipe book libgeom.so has not be added to the executable for real.

mylogin@mymachine:~$ nm -C test

There is another way to compile “against” libraries. It is the -l<somelib> flag. It means compile with library libsomelib.so. So here, as our lib is called libgeom.so, the flag to use is -lgeom.

Let us remove the test executable and rebuild it with this new way (it will fail…)

mylogin@mymachine:~$ rm test
mylogin@mymachine:~$ g++ -o test main.o medianator.o -lgeom 

You get an error… The linker (ld), i.e the software that is called to bring recipes together (it is not g++ that does this job indeed) complains about not finding the library… which is there ! Like for the .hpp that we did include with <...>, the linker is searching the libgeom.so recipe book in standard directories for libraries (i.e. /usr/lib, /usr/local/lib, …), and the current directory is not a standard directory for libraries. As we used -I<path> for adding standard directories for finding .hpp files, we add here -L<path> for adding directories where standard libraries can be found. Here, the path is the current path, i.e. ..

mylogin@mymachine:~$ g++ -o test main.o medianator.o -L. -lgeom 

You do not have such issue when you directly add the path to libgeom.so as we did before introducing the -lgeom shortcut. The idea behind -l flags is to separate the location of the libs (-L) and which lib we want to include (-l). This can be used when several versions of the libs are installed in several places on your disk… In this tutorial, the point is only to understand the mechanisms so that you can fix “I cannot find the toto library” issues when you compile. Usually, some -L/home/me/mylibs/totodir is missing, or if you actually find the libtoto.so file in the /home/me/mylibs/totodir directory, check that you did actually write -ltoto and not -ltoTo… nothing is really more complicated than that.

Ok, let us execute the executable (it will fail…)

mylogin@mymachine:~$ ./test

You get an error. Here, when the system launches test, it needs to find the recipe book, since it is not embedded in test… and it do not know where to find it. Once again, it searches it in standard library directories. To solve the issue, we have to set an environment variable first, to tell the system where libraries can also be found. Here, We concatenate the current directory (i.e. .) to the previous value of the LD_LIBRARY_PATH environement variable.

mylogin@mymachine:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

and now, without any recompiling nore any re-linking.

mylogin@mymachine:~$ ./test

That is endly it !

Three kinds of errors

It is crucial to identify the 3 kind of errors you may face. Only the third kind is hard, the two first are very easy to solve. We restart the compiling process from scratch.

mylogin@mymachine:~$ rm test *.o *.so

Syntactical errors

Syntactical errors may occur when you build binary files from .cpp files, i.e. during one of those lines.

mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomPoint.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomSegment.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomTriangle.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. medianator.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. main.cpp

Here, if something occurs, it is due to the fact that your code do not fit C++ syntactical requirements, you use undefined variables, wrong types, you include a header file that cannot be found, you forgot to include a header where the type is defined, you forget a ; symbol, you close a } that you never have opened… Read the error messages, and you will know how to fix the issues.

The use of templates can lead to very messy error messages in this stage, but except this case, nothing is really hard here.

Linking errors

Everything has compiled right, we need to gather what has to be gathered… i.e. we have to link. Even if those command are g++ ..., the software which is involved is the linker ld. Here, you can get “undefined symbols” or “multiply defined symbols”, as we have seen previously. Those errors have a very different aspect, compared to syntactical errors. Do not be confused.

mylogin@mymachine:~$ g++ -o libgeom.so -shared geom*.o
mylogin@mymachine:~$ rm geom*.o
mylogin@mymachine:~$ g++ -o test main.o medianator.o -L. -lgeom 
mylogin@mymachine:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
mylogin@mymachine:~$ ./test

Bugs

At this stage, everything compiles and can be executed. Last source of errors are bugs (previous errors are not bugs, do not call every failure a bug). A bug is when the execution of your program does something wrong: it can crash, compute wrong values, loop forever, keeping on asking memory space until saturation of the system…

These are the main issues in computer science and programming. Having strong typing (as opposed to python…), using nice object designs, testing every piece of code by a specific test, etc… are ways to reduce the risk of such bugs. Some code prooving environment also exist but it is quite a specific context.

Handle automatic compiling

Do not explore that part if you are doing preliminary self-studies for the C++ lectures.

For large C++ projects, compiling everything manually, as we did here, is not realistic. Moreover, when a file is modified you may not need to recompile everything. Usually, only one or two recompilings and few linkings are enough.

There exist tools to handle the compiling of wide pieces of code. Professional IDE (Integrated Development Environment) do such jobs behind the scene, but in case of failure, understanding the detail of the process, as we did here, is crucial.

We recommend cmake, for which a tutorial is available here. You can enter C++ without knowing that, but being able to compile big project will become required very quickly.

Hervé Frezza-Buet,