gaml
|
The gaml library has been supported by the Methodeo project. It consists of a C++ library, based on generic programming techniques, which offers tools for the use of machine learning: real risk estimator, manimulation of data, variable selection, etc... The library iself does not provide regression or classification algorithms, but rather allows the user to wrap around its favorite algorithms some general purpose machine learning features. Nevertheless, the famous libsvm package by Chih-Chung Chang and Chih-Jen Lin has already been included in gaml thanks to the gaml-libsvm extension.
Last, let us insist on one major feature of the gaml lib. It relies on c++ generic programming, which is strongly typed. The design of the library fits the mathematics of machine learning concepts, and thus the strong typing forces the user to comply to those concepts. This is deliberate. The drawback is clearly that the syntax error fixing can be a hard job, since a small error in typing can generate quite a lot of error messages. In spite of this, the benefit is that all the programming effort is concentrated on that point. Indeed, when syntaxically correct, the code leads to a safe and efficient execution . Very few time is spent at debugging run time memory errors then.
For those who are not familiar with generic programming, the use of concept may be confusing since classical object oriented relies rather on inheritence mechanisms. A concept is a syntactical requirement. In the gaml library, such requirement are documented through the use of fake classes in the gaml::concept namespace. Let us take the exemple of the gaml::concepts::Predictor concept.
The gaml::concepts::Predictor concept says that some predictor must define two types, names input_type and output_type, and that it should provide some defaut and copy constructors, as well as a operator() method. Let us propose some predictor (dummy...).
This Funny class fits the gaml::concepts::Predictor concept while no inheritance is involved. If some algorithm in the documentation is such as it requires an argument whose type fits the gaml::concept::Predictor concept, this will be specified in the documentation. For example, let us suppose that the function foo is dedicated to the manipulation of some predictor. Its declaration in the gaml lib would be
The use of the function in some code where Funny is available would be
This is will compile fine as long as the Funny class fits the gaml::concepts::Predictor concept. Moreover, when the compiler can guess the template parameter type from the function call, the template parameters can be removed. This leads to the following codes, that gives you the flavor of the gaml function calls.
This idea of the library is that data belong to collections that can be accessed by iterators. Most algorithms provided in the gaml library take iterators as argument when they have to consider a collection of data. This is complient with the STL programming style. The user is thus responsible for the way s/he stores the data. Consequently s/he has to provide functions that allows to retrieve elements in each single datum in the data set. Typically, data sets contain input/output pairs. The gaml algorithm will be provided with iterators on the dataset and it will acces to successive elements. From each element, the gaml algorithm will have to extract the input and the output contained in the pair. In order not to impose the coding of those pairs to the user, gaml algorithms will have to be given two supplementary extraction functions. Let us write some typical gaml code accordingly.
The previous code benefits from the template parameter implicite resolution, since gaml::some_algo is a template function, whose type parameters can be ommitted, as mentioned for gaml::foo previously. It can be simplified further. First, C++11 provide smarts notation for interation on collections (a new for loop syntax). Second, the auto keyword can be used where a type name is required, when the type can be guessed by the compiler. This is the case for the gaml::Shuffle<Samples::iterator,nasty-functional-types> obscure type provided by gaml. This leads to rewrite the code as this.
Moreover C++11 provides a syntax for the definition of functions on the fly in the code (lambda functions). This can be done for input_of and output_of. This leads to rewrite the code as this.
The user manual of the gaml library consists of a set of examples, available in this documentation. They are ordered, and they should be read in that order to get a comprehensive overview of the gaml features.
The use of gaml implies invoking templates that can by intricated. Thanks to C++11 syntactical elements (as auto), this intrication can be hidden to the user so that the code is kept readible. The code expresses naturally the machine learning methodological concepts, and type checking ensures that they are not misused. Once compiled, as all the type checking effort is made at compiling time, the executable is safe and efficient.