vq3
|
The vq3 library is a result of the GRONE project, supported by the Interreg "Grande RĂ©gion" program of the European Union's European Regional Development Fund.
The vq3 library is a massive rewrinting of a former vq2 library for vector quantization. It complies with C++11 and later requirements, and integrates parallélism (multi-thread).
Code examples are not in included this package, but rather in the vq3demo package. Indeed, the vq3 features are illustrated in that vq3demo package with 2D points that can be displayed thanks to openCV.
The following describes the overall 'philosophy' underlying the design of vq3, for the sake of preparing the reading of vq3demo examples.
The VQ algorithms handle a graph of prototypes. In vq3, you may start with defining your graph type as follows:
If no edge type is needed, you can write something like:
Adding a node in the graph uses the += operator. It returns a reference (it is a shared pointer).
Edges are handled in a very similar manner, using shared pointers as well.
Last, during execution, one can ask for the removing of an edge or a vertex. This is done as follows:
The library vq3 offers "foreach" functions in order to iterate on the nodes of a graph, on the edges of a graph, but also on the edges of a node. The iteration consists in calling a function for each element (node or edge), taking dynamical removals into account. This is why a direct acces to the collection of edges or nodes is not provided by vq3.
Many operation on graphs done by vq3 can be parallelized by using multi-threading (thanks to std::async mainly). Nevertheless, the graph is designed to be highly dynamic, so having several threads modifying the topology is to be avoided.
The rule of thumb is to keep the graph "read only" during multi-thread parallel sections.
While inspecting the graph thanks to foreach functions, one can call the kill() method on vertices or edges. This stands for a kill request, not an immediate kill. Once the request is made, the graph behaves as if the killed edge or vertex do not exist, but the memory is not necessarily freed. This can happen even when reading the graph, for example in foreach_edge loops when bad vertices are detected for some edge.
By default, the actual memory cleaning is done progressively, as the foreach calls inspect the graph (even when the graph is only read). This memory cleaning (i.e. garbaging) can be explicitly made by calling
As the garbaging runs implicitly during graph exploration, a multi-thread computation, even if it only reads the graph, may alter the inner graph structure for elements which are waiting to be actually killed. To prevent thread memory issue, one should suspend the garbaging process during parallel computation.
Last, let us stress here that vq3 processes which allow for multi-threading (they have a nb_threads parameter) may lock and unlock garbaging if they actually involve more than one thread.
We call the "topology" the knowledge of the graph structure, i.e the vertices and for each one its neighborhood. Algorithm may need to retrieve vertices from an index value. The topology table object provides this.
Topology tables also provides the computation of neighborhoods. The neighborhood of a vertex v is a list of (value, index) pairs. Each (value, index) is the index of one neighbour of v, value is the distance related coefficient associated to it. To compute this value, we apply a function h(e) where e is the number of edges from v to vertex #index. If e>Emax or if h(e)<Hmin, vertices are not considered as neighbours. This can be computed as follows using the topology table.
When some algorithm requires the knowledge of all neigborhoods, they can be computed at once.
There are several utilities associated with the graph class, see the vq3::utils namespace.
Vertex and edge values, i.e. the type arguments provided to the vq3::graph template, can be decorated in order to enable the use of some specific algorithms. The decoration consists of adding extra attributes to some existing values... typically, adding a boolean tag for algorithms that needs to mark some nodes or edges as visited is a decoration. These addings are performed by inheritance, but inheritance is kept simple thanks to pre-defined decorators.
In vq3, the spec of epoch is highlighted, even if many VQ algorithms are usually online. Indeed, for the sake of multi-thread computing, it is more efficient to consider a bunch of samples whose computation can be split to feed several threads. A computation pass on this bunch is what is called an 'epoch' in vq3.
When computing an epoch, i.e. presenting a bunch of samples to a graph, the trick is to bind temporarily each node of the graph and a dedicated data, called an 'epoch data'. This temporary binding is done internally, within each thread. At the end of the parallel computation, and for each vertex, the epoch data values (one for each thread) correponding to that vertex are merged.
Epoch data types are user defined, they have to fit the vq3::spec::EpochData. Nevertheless, the basic use of vq3 for defining epochs data consists of using pre-defined building blocks that are stacked in order to form the epoch data the user needs (this is similar to the decoration of vertex and edge values).
The meaning of epoch data lifecycle is summarized in this piece of pseudo-code.
The pseudo-code mentionned previously is performed in parallel, and vq3 triggers its execution by using processors. So the customization of what happens during the loop summarized in the psuedo-code is done by defining an epoch data class. Epoch data classes are designed by inheritance, which is hidden by the following stack approach.
Let us consider the following example:
The purpose of the type stack is to customize the type epoch_data used by the processor. Each stack element (epoch_data_0, epoch_data_1, ...) provide the functions so that the final epoch_data fits the vq3::spec::EpochData spec. For example, the required method notify_wta_update can be defines several times, by several stack elements. All the definitions of notify_wta_update will be called successively when overall notify_wta_update is called.
Here, epoch_data_0 is the ground of the stack, where the type sample is notified. Then, epoch_data_1 adds an accumulator of numbers (attribute vq3_bmu_accum) that accumulates the distances between the sample and the vertex prototype, when the vertex associated to the data is the actual best matching unit.
See the vq3::epoch::data namespace.
As previously said, processors enable a parallel computing of epochs, thanks to the epoch data associated to each node.
The Competitive Hebbian Learning (CHL) processor applies the Martinez and Schulten algorithm to a graph, thus updating its edges.
This processor applies a k-means update, i.e. each prototype is set to the centroid of its Voronoi cell.
This processor applies a SOM update, i.e. each prototype is set the weighted sum of some samples, the weight depending on the topological proximity of the vertex with the best matching unit.
The algorithms provided by vq3 are based on the "processor" paradigm. They are illustrated in the documentation of the vq3demo package. Read the examples provided by the vq3demo package in the order suggested by the numbers to gat a practical view of the vq3 library.