Multi-core programming based on futures

Installation

Download futures.zip and generate a suitable build environment using CMake:

unzip futures.zip ; cd futures
mkdir build ; cd build
cmake ../src
make
./futures 0  # Launch the compiled binary passing as argument the test number 0 to run the demo code

The code to edit is in subdirectory src. The subdirectory solution contains a possible solution.

Introduction

This labwork is composed of a series of exercises whose goal is to practice multi-core programming using threads and futures. Each exercise is designed to introduce new classes or functions (first std::future and std::async, then std::shared_future, std::thread, std::promise, std::jthread, etc).

Every exercise consists in computing as quickly as possible a task formalized by a computational graph, as the one given below:

Symbol table

Nodes of the graph represent so-called atomic tasks that cannot be furthed divided. Every node shows the expected time in milliseconds that it takes for the corresponding task to finish. Arcs of the graph indicate the inputs and the output of every atomic task. Depending on the graph structure, some of these atomic tasks can be run in parallel on multiple cores (e.g tasks A and B) whereas others are dependent, and must be run sequentially (e.g task C requires as inputs the results produced by task A and B). The objective is to design a multithreaded implementation of each computational graph that produces the expected result in a minimal time, whatever the processing times of atomic tasks.

For simplification purposes, the atomic computations will be simulated: they will consist in producing a std::string containing the name of the computation and the values of the received arguments, themselves supposed to be const std::string &. The duration of the computation will be simulated by introducing a timeout. For instance computation C of the previous graph can be implemented as:

// a and b are supposed to contain return values of computation A and B
std::string C(const std::string& a, const std::string& b) {
  std::this_thread::sleep_for(100ms);
  return "C("s + a + "," + b + ")"; 
}

The expected result for the previous graph and for the input "X" will thus be "C(A(X),B(X))".

For every exercise, you should only modify in file futures.cpp the corresponding computationX function (X being the number 1, 2, etc of the exercise), whose goal is to implement the computational graph of exercise number X. The C++ functions of the various atomic tasks (like the previous C function) are already implemented (using file test_utils.hpp that you should not modify). These functions are accessible as attributes of the graph object passed as argument to the computationX() functions. For example, function demo() in file futures.cpp computes sequentially, without any parallelization, the previous computation graph:

std::string demo(std::string input, const Graph1& graph) {
  std::string a = graph.A(input); // Functions A, B and C are members of the graph argument
  std::string b = graph.B(input);
  return graph.C(a, b);
}

When running from a terminal the corresponding test (0 for the demo, 1 for exercise 1, etc), the test framework will check:

the result you produce is the same as the expected one. If not it will trigger an exception.
the processing time is roughly the same as the optimal processing time. For instance, testing the demo code gives

> ./futures 0  # 0 for running the demo

|--------|--------|--------|--------|--------|------|
|        |        |        |EXPECTED|MEASURED|      |
|       A|       B|       C|  TIME  |  TIME  |STATUS|
|--------|--------|--------|--------|--------|------|
|  100 ms|  200 ms|  100 ms|  400 ms|  401 ms|    OK|

Last few remarks to take into account in your implementation:

Your solution must provide the optimal execution regardless of the execution times taken by the atomic tasks. The values of the execution times shown on the dependency graphs are given as examples and only to work on a common basis. Some exercises will test different combination of processing times for the atomic taks to ensure your code provides the optimal solution.
Care should be taken to always use the minimum number of threads required to achieve the minimum computation time.
In particular, care should be taken to ensure that all threads are in charge of a computation (i.e. no supervisor thread, waiting on the computation threads).
Taking into account the previous remarks, we can assume that the main thread will always take charge of the leftmost branch of the computational graph.

“Simple” parallel computing

Implement the computation expressed by the previous dependency graph by completing function `computation1` in file `futures.cpp`.

“Interleaved” parallel computations

Consider the following computational graph, in which atomic tasks C and D both need the results of tasks A and B. However, tasks C and D can start working on one of the results without waiting for the other (e.g. C can work on the result of A without immediately needing the result of B). This is reflected in the dependency graph by splitting tasks C and D into two sub-taskss (C into C1 and C2, D into D1 and D2).

Symbol table

Code your solution, this time by completing function `computation2`.

“Concurrent” parallel computations

Consider the following dependency graph, in which task C can be executed as soon as one of the two computations A OR B produces a result, without having to wait for the result of the other task. Such a case occurs, for example, in parallelizable search problems (such as searching for a given element in two arrays) where the result is determined as soon as the searched element is found by one of the threads.

Symbol table

Code your solution, this time by completing function `computation3`.

Improvement of the previous problem

Uncomment the two last lines of function test3 and runs the test again. Why does it fail?

To regain optimality, it is necessary to interrupt the threads still active as soon as a thread finds the result (for the durations given in the graph aboce, task B should be interrupted as soon as A terminates). To do this, we can

Either use an atomic boolean variable shared between threads which is set to true as soon as the result has been found so that threads can stop as soon as possible.
Use cancellable std::jthread introduced in C++20.

Whatever the solution, this can only work by checking periodically if the stopping flag (prior to C++20) or the internal state of std::jthread (from C++20) has been set. If so the thread should return, otherwise it should continue its job. To implement this polling, the functions implementing atomic tasks provided to function computation4 are slightly different as a single call to them only lasts 10ms. While the total time spent calling such a function is less than the one required to compute the results, the return value of this function is an empty string. Once the total time reaches the required time, the return values becomes equal to the expected output. The example of code given below illustrates how to compute the result of A with such function:

std::string res; // String to contain the result of A
while(res.empty()) { // While the result is not ready
  res = graph.A(input); // "Works" for 10ms
}

Code such a solution, this time by completing function `computation4`.

Frédéric Pennerath,

Multi-core programming based on futures

Installation

Introduction

Last few remarks to take into account in your implementation:

“Simple” parallel computing

Implement the computation expressed by the previous dependency graph by completing function computation1 in file futures.cpp.

“Interleaved” parallel computations

Code your solution, this time by completing function computation2.

“Concurrent” parallel computations

Code your solution, this time by completing function computation3.

Improvement of the previous problem

Code such a solution, this time by completing function computation4.

Implement the computation expressed by the previous dependency graph by completing function `computation1` in file `futures.cpp`.

Code your solution, this time by completing function `computation2`.

Code your solution, this time by completing function `computation3`.

Code such a solution, this time by completing function `computation4`.