License : Creative Commons Attribution 4.0 International (CC BY-NC-SA 4.0)
Copyright :
Hervé Frezza-Buet,
CentraleSupelec
Last modified : April 19, 2024 10:22
Link to the source : compile.md
Download the archive preprocess-002.tar.gz, uncompress it and go into the directory.
mylogin@mymachine:~$ tar zxvf preprocess-002.tar.gz
mylogin@mymachine:~$ rm preprocess-002.tar.gz
mylogin@mymachine:~$ cd preprocess-002
mylogin@mymachine:~$ ls
You have a bunch of files here, organized as follows.
First, we provide “recipes” for handling geometry stuff. The files are named geom*.hpp
and geom*.cpp
. We will examine them hereafter.
Second, we have used the geometry recipe to write auxiliary functions, that compute the medians of a triangle. These functions are defined in files medianator.hpp
and medianator.cpp
.
Last, our main program is written in the file main.cpp
.
In C++ compiling, the idea is to have one binary file per .cpp
file. Headers (i.e .hpp
files) are only here to be included in .cpp
files in order to enable correct compiling.
Let us start with geomPoint.cpp
. Here, we use g++
but the clang
compiler behaves the same.
Read the code of geomPoint.cpp
, and read the content of the files which are included (do not read the system files like cmath
). The order of inclusion makes you read geomPoint.hpp
first, since it is copy-pasted at the beginning of geomPoint.cpp
via the #include <geomPoint.hpp>
directive.
Ok, let us ask g++
to build a binary from it (it will fail… don’t be afraid).
mylogin@mymachine:~$ g++ -std=c++17 geomPoint.cpp
You get an error, telling that the compiler does not find the geomPoint.hpp
file… which is there ! The reason is that #include <...>
, with brackets, means “search in the standard directories”… and the current directory is not among the standard directories (they are /usr/include
, /usr/local/include
, …).
So we have to tell the compiler to consider the current directory, i.e. the directory .
, as one of the standard directories where included files may be found. This is the meaning of the -I<path>
compiler flag. So to solve our issue, we have to call
mylogin@mymachine:~$ g++ -std=c++17 -I. geomPoint.cpp
Compiling succeeds (I know you still have an error). It means that you code is correct, everything mentioned is defined, and so on. The error you get is that the generated binary does not contain any main
function. Indeed, our geomPoint.cpp
, even after the copy-pasting of the included files by the preprocesing, does not implement any main
function.
The reason is that the compiler tries to make an executable binary, and executable binaries, by convention, have to implement a function called main
at which the execution thread starts. Here, we only have to tell the compiler that we only write a binary version of our code, but it is ok if it does not implement an executable binary. The -c
flags means that.
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomPoint.cpp
Check the directory, you have now a geomPoint.o
file. It is a binary file, unreadable by a human brain, that contains your program translated by the compiler into the language of your micro-processor.
Even if it is not human readable, there are some tools to inspect what is written in binary files. Let us try it with our first geomPoint.o
binary file.
You can add extra flags on the command line, as for example asking the compiler to check strictly that your code fits the ISO C++ norm. This is done like this, adding -pedantic
and -Wall
(meaning “warnings all”).
mylogin@mymachine:~$ g++ -std=c++17 -Wall -pedantic -I. -c geomPoint.cpp
mylogin@mymachine:~$ nm -C geomPoint.o
You get lines such as
000000000000017c T geom::Point::operator=(geom::Point const&)
Consider geomPoint.o
as a recipe book, the display you get is its table of contents. First hexadecimal number is the address of the recipe (it is the page number in a real book), second element is a letter (T
here), and then you have the recipe name (geom::Point::operator=(geom::Point const&)
here), which is the name of the function. Remove the -C
flag of the nm
command in order to see the real names of your recipes.
The letter T
(stands for text) means that the text of the recipe is actually in that book. You may find in the table of content recipes with a letter U
(undefined). It means that, in this book, some recipes are mentioning a recipe, but that recipe is not written in that book. Here, this is the case for recipe sqrt
that we invoke in geomPoint.cpp
, but for which we did not write the code. The header of sqrt
is defined in cmath
that we include, so the compiling is successful. Remove #include <cmath>
from the geomPoint.cpp
file, re-compile, and you will see the error… then put it back and recompile.
Now, do the same for geomSegment.cpp
(read the code following the inclusions), and be sure to understand what is included, and why the table of contents of the recipe bool geomSegment.o
is as it is.
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomSegment.cpp
mylogin@mymachine:~$ nm -C geomSegment.o
You may notice that the calls of functions related to geomPoint.o
have been correclty compiled, thanks to the inclusion of geomPoint.hpp
, but that these functions have the letter U
in the table of contents. Indeed, they are not defined in that book, geomSegment.o
, but in geomPoint.o
as we have seen previously.
We do the same for all the .cpp
files, take the time to read the code, and understand the output of nm
.
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomTriangle.cpp
mylogin@mymachine:~$ nm -C geomTriangle.o
In the next files, geometry headers are included all to once, thanks to the geom.hpp
file that only includes all the geom*.hpp
files.
mylogin@mymachine:~$ g++ -std=c++17 -I. -c medianator.cpp
mylogin@mymachine:~$ nm -C medianator.o
mylogin@mymachine:~$ g++ -std=c++17 -I. -c main.cpp
mylogin@mymachine:~$ nm -C main.o
The last one, main.o
, contains a main
function. So we can gather the books (all the .o
files) to make an executable, since all the recipes are there. The flag -o
tells the name of the executable (you will get an error… do not be afraid).
mylogin@mymachine:~$ g++ -o test main.o geom*.o medianator.o
You get errors… the gathering process is called “linkage”. It consists in gathering books, and chek that every function mentioned in a book is defined somewhere in the books gathered (i.e. any U
recipe is defined as a T
recipe somewhere). Moreover, each recipe must be defined… once !
Here, for example, the function geom::operator*(double, geom::Point const&)
is defined (with a T
) in all the files ! Indeed, we have written the recipe in geomPoint.hpp
, which is included by all the recipes… So they all define the function.
A fix could be to write only the headers
Point operator*(double a, const Point& p);
Point operator*(const Point& p, double a);
Point operator/(const Point& p, double a);
in the file geomPoint.hpp
, and write the full code (very short) in geomPoint.cpp
, as we did for all the other functions/methods. But here, let us keep the code in geomPoint.hpp
, and add the inline
keyword like this
inline Point operator*(double a, const Point& p) {return {a * p.x, a * p.y};}
inline Point operator*(const Point& p, double a) {return a * p;}
inline Point operator/(const Point& p, double a) {return {p.x / a, p.y / a};}
Recompile everything
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomPoint.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomSegment.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c geomTriangle.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c medianator.cpp
mylogin@mymachine:~$ g++ -std=c++17 -I. -c main.cpp
Check all .o
with nm
, as illustrated here for main.o
mylogin@mymachine:~$ nm -C main.o
And you will see that the inlined functions appear only if they are used in the recipes, and if so, they are defined with a symbol W
(weak). It means the multiple definitions in the final gathering are allowed.
Inline functions are more than that. Indeed, the compiler can decide not to implement the recipe, but to re-write it each time, as needed, when a recipe calls the function. In this case, there is no function call at excution time, it is as if the code of the inline function had been copy-pasted each time it was required (like a macro substitution).
This saves time when the compiler does it, and the compiler may do it for short functions as the ones we have inlined.
So now, we can gather the books and execute our executable (it displays nothing).
mylogin@mymachine:~$ g++ -o test main.o geom*.o medianator.o
mylogin@mymachine:~$ ./test
The compiling could have been done directly from all the sources, but it is much better to understand how things can be compiled separately and then gathered by linkage. The global compiling command (not recommended) is the following:
mylogin@mymachine:~$ g++ -o test -std=c++17 -I. *.cpp
mylogin@mymachine:~$ ./test
Now let us suppose that we forget a book in the final gathering. Let us forget medianator.o
on purpose. You will get a linking error. Try it !
mylogin@mymachine:~$ g++ -o test main.o geom*.o
The linking stage (i.e. the software called the linker) complains about an undefined reference to the median_by_A
… it is called (U
) somewhere (in main.o
, it is mentionned in the error message) but no books among the ones we have gathered defines it (no T
).
So you may wonder about the sqrt
function… that we have not written and for which we do not have taken the book in the gathering. As it is a function of the standard library, the compiler adds that cmath.o
book (this is not its real name) automatically… but it has actually been added.
A library is very similar to a .o
file, i.e it is a recipe book (written in binary code, not in C++). So building a library is very similar to compiling .cpp
files into .o
files as we did so far.
The difference with usual .o
files is that a library can be shared. Indeed, when an executable is loaded in the memory (the RAM) by the system to be executed, it do not contain the library. The library is loaded apart, in memory as well. So at the end, it is the same as loading everything in memory. Things change when the system loads another executable that needs the same recipes (i.e. the same library). In this case, as the library is already loaded for the previous executable, it is not loaded twice. The system knows that the book is already loaded in the RAM, and the two executables share the text of the recipes.
Ok, so let us consider that the geometry tools are a set of tools that could form a geometry consistent recipe book, and let us build it.
First, we compile everything, adding a specific flag. We first remove previously compiled *.o
and test
binaries, then we rebuild them.
mylogin@mymachine:~$ rm -f *.o test
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomPoint.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomSegment.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomTriangle.cpp
mylogin@mymachine:~$ ls *.o
Now, we gather all the geom*.o
objects into a single recipe called libgeom.so
(so
means “shared object”, it is called dll
on Windows).
mylogin@mymachine:~$ g++ -o libgeom.so -shared geom*.o
mylogin@mymachine:~$ rm geom*.o
mylogin@mymachine:~$ ls
You can also ckeck the content of libgeom.so
mylogin@mymachine:~$ nm -C libgeom.so
And that’s it. Now, we can compile other .o files as previously.
mylogin@mymachine:~$ g++ -c -std=c++17 -I. medianator.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. main.cpp
Let us make an executable from the whole stuff. We only need to gather main.o
and medianator.o
, since the other recipes in libgeom.so
will be gather only when the system will launch our final test
executable (we will get an error, do not be afraid).
mylogin@mymachine:~$ g++ -o test main.o medianator.o
You get linkage errors… the system asks you to provide the recipe for the functions which are not written yet in main.o
and medianator.o
. These functions are used in main.o
and medianator.o
but defined in the libgeom.so
recipe book. Nothing is wrong, since indeed, we will provide the functions when we will launch test
for execution… this is why shared libraries are called shared.
So where is the problem ? It is only a problem of checking that everything will be there. So, for checking only, we have to provide libgeom.so
to the last command.
mylogin@mymachine:~$ g++ -o test main.o medianator.o libgeom.so
You can check that the geom function are still undefined, i.e. the recipe book libgeom.so
has not be added to the executable for real.
mylogin@mymachine:~$ nm -C test
There is another way to compile “against” libraries. It is the -l<somelib>
flag. It means compile with library libsomelib.so
. So here, as our lib is called libgeom.so
, the flag to use is -lgeom
.
Let us remove the test
executable and rebuild it with this new way (it will fail…)
mylogin@mymachine:~$ rm test
mylogin@mymachine:~$ g++ -o test main.o medianator.o -lgeom
You get an error… The linker (ld
), i.e the software that is called to bring recipes together (it is not g++
that does this job indeed) complains about not finding the library… which is there ! Like for the .hpp
that we did include with <...>
, the linker is searching the libgeom.so
recipe book in standard directories for libraries (i.e. /usr/lib
, /usr/local/lib
, …), and the current directory is not a standard directory for libraries. As we used -I<path>
for adding standard directories for finding .hpp
files, we add here -L<path>
for adding directories where standard libraries can be found. Here, the path is the current path, i.e. .
.
mylogin@mymachine:~$ g++ -o test main.o medianator.o -L. -lgeom
You do not have such issue when you directly add the path to libgeom.so
as we did before introducing the -lgeom
shortcut. The idea behind -l
flags is to separate the location of the libs (-L
) and which lib we want to include (-l
). This can be used when several versions of the libs are installed in several places on your disk… In this tutorial, the point is only to understand the mechanisms so that you can fix “I cannot find the toto
library” issues when you compile. Usually, some -L/home/me/mylibs/totodir
is missing, or if you actually find the libtoto.so
file in the /home/me/mylibs/totodir
directory, check that you did actually write -ltoto
and not -ltoTo
… nothing is really more complicated than that.
Ok, let us execute the executable (it will fail…)
mylogin@mymachine:~$ ./test
You get an error. Here, when the system launches test
, it needs to find the recipe book, since it is not embedded in test
… and it do not know where to find it. Once again, it searches it in standard library directories. To solve the issue, we have to set an environment variable first, to tell the system where libraries can also be found. Here, We concatenate the current directory (i.e. .
) to the previous value of the LD_LIBRARY_PATH
environement variable.
mylogin@mymachine:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
and now, without any recompiling nore any re-linking.
mylogin@mymachine:~$ ./test
That is endly it !
It is crucial to identify the 3 kind of errors you may face. Only the third kind is hard, the two first are very easy to solve. We restart the compiling process from scratch.
mylogin@mymachine:~$ rm test *.o *.so
Syntactical errors may occur when you build binary files from .cpp
files, i.e. during one of those lines.
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomPoint.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomSegment.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. -fPIC -shared geomTriangle.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. medianator.cpp
mylogin@mymachine:~$ g++ -c -std=c++17 -I. main.cpp
Here, if something occurs, it is due to the fact that your code do not fit C++
syntactical requirements, you use undefined variables, wrong types, you include a header file that cannot be found, you forgot to include a header where the type is defined, you forget a ;
symbol, you close a }
that you never have opened… Read the error messages, and you will know how to fix the issues.
The use of templates can lead to very messy error messages in this stage, but except this case, nothing is really hard here.
Everything has compiled right, we need to gather what has to be gathered… i.e. we have to link. Even if those command are g++ ...
, the software which is involved is the linker ld
. Here, you can get “undefined symbols” or “multiply defined symbols”, as we have seen previously. Those errors have a very different aspect, compared to syntactical errors. Do not be confused.
mylogin@mymachine:~$ g++ -o libgeom.so -shared geom*.o
mylogin@mymachine:~$ rm geom*.o
mylogin@mymachine:~$ g++ -o test main.o medianator.o -L. -lgeom
mylogin@mymachine:~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
mylogin@mymachine:~$ ./test
At this stage, everything compiles and can be executed. Last source of errors are bugs (previous errors are not bugs, do not call every failure a bug). A bug is when the execution of your program does something wrong: it can crash, compute wrong values, loop forever, keeping on asking memory space until saturation of the system…
These are the main issues in computer science and programming. Having strong typing (as opposed to python…), using nice object designs, testing every piece of code by a specific test, etc… are ways to reduce the risk of such bugs. Some code prooving environment also exist but it is quite a specific context.
Do not explore that part if you are doing preliminary self-studies for the C++ lectures.
For large C++ projects, compiling everything manually, as we did here, is not realistic. Moreover, when a file is modified you may not need to recompile everything. Usually, only one or two recompilings and few linkings are enough.
There exist tools to handle the compiling of wide pieces of code. Professional IDE (Integrated Development Environment) do such jobs behind the scene, but in case of failure, understanding the detail of the process, as we did here, is crucial.
We recommend cmake
, for which a tutorial is available here. You can enter C++ without knowing that, but being able to compile big project will become required very quickly.