Vector Quantization

Introduction

The goal of this lab work is to understand play with vector quantization techniques, and more specifically with Kohone Self-Organizing Maps. First, let us introduce vector quantization algorithms on 2D points, so that visualization is easy (with opencv). Then, an application to handwritten digits recognition is proposed.

The 2D case

2D drawings

Download the file draw_points.py, read it, and try the following code. It provides you with an example for writing opencv-base scripts.

# coding: utf-8

import cv2
import numpy as np
import draw_points as dp

nb_samples = 0
width, height = 640, 480
margin = 50
delay = 100

# This is the callback of the trackbar.
def on_value(v) :
    global nb_samples
    nb_samples = v

print()
print()
print('Press ESC to quit')
print()
print()

cv2.namedWindow('display') # Creates a window
cv2.createTrackbar('nb_samples', 'display', 0, 5000, on_value) # Adds a trackbar.

display = np.zeros((height, width, 3), np.uint8)

grid_x = np.linspace(100, 400, 20, endpoint=True)
grid_y = np.linspace(100, 300, 10, endpoint=True)
grid = np.array([[(x,y) for x in grid_x] for y in grid_y])

keycode = 0
while keycode != 27: # 27 means ESC in ascii.
    display[...] = 255 # we clear with a white background
    X = np.random.randint(margin, width-margin, size=nb_samples)
    Y = np.random.randint(margin, height-margin, size=nb_samples)
    pts = np.column_stack((X, Y))
    dp.scatter(display, pts)
    dp.point_grid(display, grid, (0, 0, 200), 3)

    cv2.imshow('display', display)
    keycode = cv2.waitKey(delay) & 0xFF

cv2.destroyAllWindows()

K-means (Lloyd iterations)

Let us implement successive Lloyd iterations in order to compute k-means. Download some 2D distributions, i.e. the file distrib.py. Then download the file lloyd.py, run it and read it.

K-means (online)

Th k-means online consists of sampling the input samples one by one, instead of building up a dataset beforehand, as previously. The algorithm consists of repeating the following iteration…. forever.

toss \(\xi\) according to the input distribution.
find the closest prototype identifier \(k^\star = \displaystyle\mathop{\mathrm{argmin}}_k d(\xi, \omega_k)\).
Update \(\omega_{k^\star}\) as
\(\omega_{k^\star} \leftarrow \omega_{k^\star} + \alpha(\xi - \omega_{k^\star})\)
with \(\alpha \in [0,1]\).

Implement a visualization of this algorithm, step by step, using the “dumbbell” distribution. A trackbar will allow for choosing \(\alpha\) (the track bar will range from 0 to 1000, corresponding to \(\alpha\) from 0 to 1). While running, enable the user to press keys, in order to reinitialize the prototypes. You should offer 3 kinds of reinitialization (triggered by three different key hits):

toss the prototypes from the “dumbbell” distribution.
initialize all the prototype in the center of the left disk.
initialize all the prototypes at the top-middle position of the image.

Observe the behavior of the algorithms.

Self-Organizing maps

The SOM algorithm is very similar to the online k-means tested previously, except that it implement a “winner-take-most” learning, rather than a “winner-take-all”.

Download the script som-points.py, run it and read it.

Everything is given in this section, take time to read and understand clearly the code and the execution. Experiment the effect of the sliders. You will be asked to develop a SOM for digit quantization in next sections, using the same code structure.

Quantifying handwritten digits

In this section, the idea is to implement a self-organizing map with 28x28 digits. Digits are the vectors in \([0..255]^{784}\). The visualization of the grid evolution in the input space cannot be done anymore since Opencv is “limited” to 2D points.

Getting started with digits

Download the files digits.py and draw_digits.py. Read the file digits.py and run it. The main section of this file is a tutorial.

First data analysis

Let us compute a classifier as follows. Learning consists in computing, for each class, the average inputs from the samples in the dataset. This leads to one average per digit class. Then prediction consists of comparing the input to the 10 averages, find the closest, and return the corresponding class.

Thanks to the function digits.closest_in_grid, set up such a classifier class that fits sklearn classifier requirements (see the hint below). Consider an attribute \(\sigma\) as the parameter of the classifier. If \(\sigma \neq 0\), the prediction is computed from a blurred version of the input, \(\sigma\) being the standard deviation of the Gaussian kernel used in the blur (use the digits.blur function).

Here is a hint for the classifier class

class AvgDigitsClassifier:
    def __init__(self, sigma=0) :
        self.avg = np.zeros((1, 10, 28, 28), float)
        self.sigma = sigma

    def fit(self, X, y) :
        # <your code here>
        return self # mandatory

    def predict(self, X) :
        # <your code here>
        return 0 # Change this

    def score(self, X, y):
        return np.sum(self.predict(X) == y)/X.shape[0]

    def get_params(self, deep):
        return {'sigma': self.sigma}

Use the cross validation to estimate the best value \(\sigma^\star\) for \(\sigma\), plot the estimated real risk against the value of \(\sigma\). Cross validation is used as follows in scikit.

import sklearn.model_selection

scores = sklearn.model_selection.cross_val_score(classifier, X, y, cv=5)
print('scores = ')
for s in scores:
    print('         {:.2%}'.format(1-s))
    print('        ------')
    print('  risk = {:.2%}'.format(1-np.average(scores)))

Print the confusion matrix of the best classifier.

A self-organizing map for digits

Implement a self organizing map for computing digits. The interface should look like this :

Provide key bindings such as :

Press ESC to quit
Press x to reinitialize the prototypes
Press a to sample all digits
Press o to sample odd digits
Press e to sample even digits
Press 0,1,2,..,9 to sample 50% of the selected digits
Press s to save

Saving consists of saving the prototypes (numpy allows for saving arrays) as well as an image of the grid.

Play with the parameters and the different kinds of sampling. In ‘a’ mode, when a “beautiful” map is displayed, save it. You have 2 files, an image of the prototypes and the numpy dump of them. These are used next.

Digit recognition expert… is you !

In the previous section, we have quantified all the digits. As a human expert in digit recognition, you can labellise all the prototype. From the previously saved image, create a .lbl file like this (next example shows a 15x10 map):

3 3 3 3 2 2 2 2 2 2 0 0 0 0 0
3 3 3 3 0 2 2 2 2 * 6 0 0 0 0
3 * 5 * 0 0 * 6 6 6 6 6 0 0 0
3 * 5 5 * 0 * 6 6 6 6 6 0 0 0
3 8 8 5 5 5 4 4 4 4 * 6 0 0 0
8 8 8 8 * 9 7 9 9 4 4 4 9 5 5
8 8 8 8 * * 7 7 9 9 9 9 9 5 5
1 1 1 1 1 1 7 7 9 9 9 9 9 9 5
1 1 1 1 1 1 3 3 3 9 9 7 7 7 7
1 1 1 1 1 1 3 3 3 9 * 7 7 7 7

Put a star symbol for prototypes that you are not able to label. The above .lbl file has been handcrafted from the following image, got from saving:

Write a python script that loads the numpy array and the .lbl file. The loading of the .lbl. file is given:

def load_labels(filename) : # filename is a .lbl file
    def label_of_string(s) :
        if s == '*' :
            return -1 # The star symbol is an impossible label
        else:
            return int(s)
    chars = [line.split() for line in open(filename)]
    chars = [line for line in chars if line != []]     # we remove empty lines
    return np.array([[label_of_string(s) for s in line] for line in chars])

Write a predictor as follows: when x feeds the predictor, the closest prototype (eventually from a blurred x according to the best \(\sigma\) previously found) is determined, and the corresponding label is returned.

Test the empirical risk of that predictor on the dataset.

When you’re done

Here is an example of what could have been done.

Hervé Frezza-Buet,