Table of contents

Notion of metaprogramming

Metaprogramming is the ability of a programming language to write pieces of source code that generate and/or manipulate other pieces of code.
In case of C++, metaprogramming is provided at compilation time, thanks mostly to templates.
When designing a software library based on templates (like the standard library), C++ metaprogramming is mainly useful to address two goals:
- Genericity: by conditionning the generated source code on the context of its usage (i.e the arguments passed when instantiating a template), we can cover more cases.
- Performance: similarly, one can produce optimized source code specifically for the usage context, as shown in multiple examples below.
In this section we consider mechanisms of the language
- that can adapt the generated source code to the context: template specialization, conditional branching, type traits, SFINAE*
- or that precompute partial results at compilation time thanks to constexpr functions. This latter functionnality is not a kind of metaprogramming, strictly speaking

Template Specialization

The first (and oldest) mechanism of metaprogramming is template specialization: the code produced when instantiating a template can depend on the values passed to template arguments.

Full or total template specialization

Let consider the optimized code to compute the power \(b^p\) of a number \(p\) when base \(b\) is known at compilation time.

The trick is to decompose the power in base 2: for instance if \(p = 22\), we can write:

\[ b^{22} = b^{10110_2} = b^{16 + 4 + 2} = b^{16} \times b^4 \times b^2 = \left(\left(\left(b^2\right)^2\right)^2\right)^2 \times \left(b^2\right)^2 \times b^{2}\]

template<int base>
  unsigned int power(unsigned int p) {
    unsigned int res = 1, factor = base;
    while(p > 0) {
      if(p % 2 == 1) res *= factor;
      p = p >> 1;       // Right shift of 1 bit <=> division by 2
      factor *= factor; // Square of factor
    }
    return res;
  }

In the particular case of a binary base \(b = 2\), one can even do faster:

  int power_binary_base(unsigned int p) {
    return 1 << p; // Shift of p bits to the left
  }

How could we get one function that automatically decides which version to call, depending on the value of \(b\). This can be donne by specializing templates (either function, method or class templates):

template<int base>
  unsigned int power(unsigned int p) {
    ... // General case
    return res;
  }
  
// Specialization if base equal to 2
template<>  // See the special syntax here
  unsigned int power<2>(unsigned int p) { // See the value 2 that conditions the specilization
    return 1 << p;
  }
  
int main() {
  std::cout << "10^6 = " << power<10>(6) << std::endl; // General template is instantiated here
  std::cout << "2^6 = " << power<2>(6) << std::endl; // Specialized template is instantiated here
}

Note

The template syntax <> tells this is a full or total specialization, where all template arguments have been set (none remains unknown). This is opposed to the notion of partial specialization presented in the next section.

Partial specialization of class templates

Suppose that the function template power is parameterized nont only by the base value but also with the type of scalar Scalar that is used to compute the power. Then the specialization of the function template when base is equal to 2, becomes a partial specialization becuase Scalar is still unspecified in the specialization.:

template<typename Scalar, int base> 
Scalar power(unsigned int p) {
  ...
}

// Partial specialization: Scalar argument is still unspecified
template<typename Scalar>
Scalar power<Scalar,2>(unsigned int p) { 
  return static_cast<Scalar>(1 << p);
}

Unfortunately, partial specialization is only authorized for class templates, not for function, method and operator templates. The reason is that it introduces too much ambiguity and conflict with the other mechanism of function overloading.

However, to solve the previous problem, the function template power can be rewritten as a class Power in order to have access to partial specialization:

// Class template Power implements the general case
template<typename Scalar, int base> struct Power {
  Scalar operator()(unsigned int p) { // class Power is here a functor: by overloading operator(), it behaves like a function
    ...
  }
};

// Partial specialization of class template Power implements the specific case when base = 2
template<typename Scalar> struct Power<Scalar,2> {
  Scalar operator()(unsigned int p) {
    return static_cast<Scalar>(1 << p);
  }
};

// For cosmetic purposes, we introduce a wrapper function that hides the subtleties of class template Power
template<typename Scalar, int base>
Scalar power(unsigned int p) {
  return Power<Scalar, base>{}(p); // Instantiates an object of type Power (no attributes => no cost) and call it.
}
std::cout << "10^6 = " << power<double,10>(6); //  General template is instantiated here
std::cout << "2^6 = " << power<double,2>(6); // Specialized template is instantiated here

Metaprogramming tests: Conditional branching using “if constexpr”

C++17 has introduced condtional branching based on constexpr expressions, i.e. expressions that can be evaluated at compiltation time. It advantageously replaces the preprocessing directives #ifdef as it benefits from all the C++ features (typing, etc). This feature is very practical and easy to write compared to template specializations, while not as powerful (for instance attributes of class can not be conditionnally defined using conditional branching. This is the case for our example:

template<int base>
  unsigned int power(unsigned int p) {
    if constexpr (base == 2)
      return 1 << p;                       // This scope is compiled only if base = 2
    else {
      unsigned int res = 1, factor = base; // This scope is compiled only if base <> 2
      ...
      return res;
    }
  }

Notes

The condition expression of a if constexpr is evaluated at compile time, and therefore must be constexpr.
In function templates, the dead branch of if constexpr is never compiled and can contain inconsistent code (only the syntax level must be correct).

Metaprogramming loops

So far metaprogramming can only perform conditional tests. What about loops? Loops can be addressed using recursion.

The example of loop unrolling

One classical example is the technique of loop unrolling. To understand the problem this technique addresses, suppose we want to add to vectors. This can be done using an optimized C style loop as shown below:

template<typename Scalar, int dims>
class Vector {
    ...
   // A naive addition of two vectors
   Vector<Scalar, dims>& operator+=(const Vector<Scalar, dims>& other);
};

template <typename Scalar, int dims>
Vector<Scalar, dims>& Vector<Scalar, dims>::operator+=(const Vector<Scalar, dims>& v) {
   const Scalar* src = other.coefs_;
   for(Scalar* dst = coefs_, end = coefs_ + dims; dst != end;) *dst++ += *src++;
   return *this;
}

int main() {
   Vector<double,5> v1, v2;
   v1 += v2;
}

If we consider the example provided in function main() with vectors of size 5, the sequence of run instructions when calling operator+= is

1)  src = other.coefs_;
2)  dst = coefs_;
3)  end = coefs_ + dims;
4)  test dst != end;
5)  *dst++ += *src++;
6)  test dst != end;
7)  *dst++ += *src++;
8)  test dst != end;
9)  *dst++ += *src++;
10) test dst != end;
11) *dst++ += *src++;
12) test dst != end;
14) *dst++ += *src++;
15) test dst != end;
16) return *this

Sixteen instructions (where every instruction is more or less translated in one processor instruction) are required. Suppose our application makes an intensive use of vector addition so that we seek a highly optimized implementation of operator+= Can we improve it? In theory yes, because the vector dimension of five is known at compilation time. We could therefore write a partial template specialization for our specific dimension of 5 using loop unrolling:

// Overloads the operator+= for partial specialization of class Vector when dims=5

template <typename Scalar>
Vector<Scalar, 5>& Vector<Scalar, 5>::operator+=(const Vector<Scalar, 5>& v) {
   Scalar* dst = coefs_;
   const Scalar* src = other.coefs_;
   *dst++ += *src++;
   *dst++ += *src++;
   *dst++ += *src++;
   *dst++ += *src++;
   *dst++ += *src++;
   return *this;
};

This will result in a sequence of 8 instructions, with a substantial speed-up (even if the speedup factor is likely to be less than \(16/8 = 2\) as the save comparisons between registers are run faster than instructions *dst++ += *src++ that transfer bytes to/from memory).

Generic loop unrolling based on template recursion

However the previous code is highly specific as it works only for vectors whose dimension is exactly 5. How could we provide a generic solution? More precisely how could we write a code run at compilation time that generates the source code of operator+= by duplicating dims times instruction *dst++ += *src++? This can be done using specialization:

template<typename Scalar, int dims>
class Vector {
   ...
   Vector<Scalar, dims>& operator+=(const Vector<Scalar, dims>& v);
};

// Definition of a class to unroll loops
template<typename Scalar, int count>
struct Unroll {
  static void add(const Scalar* src, Scalar* dst) {
    *dst++ += *src++;                       // Duplicate loop body
    Unroll<Scalar, count-1>::add(src, dst); // Recursive call
  }
};

// Partial specialization when count is 0
template<typename Scalar>
struct Unroll<Scalar, 0> {
  static void add(const Scalar* src, Scalar* dst) {} // End recursion here
};

// Internal operator += initializes recursion
template <typename Scalar, int dims>
Vector<Scalar, dims>& Vector<Scalar, dims>::operator+=(const Vector<Scalar, dims>& v) {
   Unroll<Scalar, dims>::add(v.coefs_, coefs_); // Start unrolling recursion
   return *this;
}

Since C++20, this can be written more concisely using a if constexpr expression:

// Definition of class to unroll loops
template<typename Scalar, int count>
struct Unroll {
  static void add(const Scalar* src, Scalar* dst) {
    if constexpr (count > 0) {
      *dst++ += *src++;                       // Duplicate loop body
      Unroll<Scalar, count-1>::add(src, dst); // Recursive call
    }
  }
};

Notes

A total specialization of a method template in a class template is not possible. The add method cannot be declared as a method template template<int count> void add(...) in class template Vector. Solution consists in using partial specialization of a class template Unroll specially dedicated to metaprogramming.

Computation at compilation time and constexpr functions

Sometimes computation and checking can be partially or totally preprocessed during compilation.

constexpr functions have been introduced in C++11 and improved since.

#include <iostream>
#include <array>

// The three next functions are constexpr, i.e. callable at compilation time
// Note limited set of available instructions for cmp_strings

constexpr bool cmp_strings(const char* a, const char* b) {
    return *a == *b && (*a == '\0' || cmp_strings(a + 1, b + 1));
}

constexpr bool check_encoding(const char* encoding) {
  if(cmp_strings(encoding, "RGB"))
    return true;
  else
    return false;
}

constexpr int n_channels(const char* encoding) {
  if(cmp_strings(encoding, "RGB"))
    return 3;
  else
    return 0;
}

template<int channels> class Color : public std::array<double, channels> {};



// constexpr is important here.
constexpr const char* encoding = "RGB";

// Check encoding is available at compilation time
static_assert(check_encoding(encoding), "Unknown encoding");

int main() {

  // Create a color with the right number of channels
  Color<n_channels(encoding)> color;
  std::cout << color.size() << std::endl;
}

Frédéric Pennerath,