Customizing SOM/LVQ_PAK

This is not yet finished but it should give you some idea...

What to do if you, for example, want to use a different distance function or use a different way to adapt a vector? No worries, here are some hints to get you started.

Most of the functions in the package (som_training, codebook labeling, etc.) take their parameters from the teach_params-structure that is passed to them. The structure is defined as follows (in lvq_pak.h):

struct teach_params {
  short topol;
  short neigh;
  short alpha_type;
  MAPDIST_FUNCTION *mapdist;  /* calculates distance between two units */
  DIST_FUNCTION *dist;        /* calculates distance between two vectors */
  NEIGH_ADAPT *neigh_adapt;   /* adapts weights */
  VECTOR_ADAPT *vector_adapt; /* adapt one vector */
  WINNER_FUNCTION *winner;    /* function to find winner */
  ALPHA_FUNC *alpha_func;
  float radius;               /* initial radius (for SOM) */
  float alpha;                /* initial alpha value */
  long length;                /* length of training */
  int knn;                    /* nearest neighbours */
  struct entries *codes;
  struct entries *data;
  struct snapshot_info *snapshot;
  time_t start_time, end_time;
};

The different function types are defined in lvq_pak.h:

MAPDIST_FUNCTION - calculates distance between two map units.

This type of function calculates the distance between to units on the 2D map. Examples of these kind of functions are rect_dist and hexa_dist which calculate distances on rectangular and hexagonal maps respectively. The type definition for this function is:

  typedef float MAPDIST_FUNCTION(int bx, int by, int tx, int ty);

where (bx,by) and (tx,ty) are the coordinates of two units on the map. These type of functions are used by the neighbourhood adaptation functions.

DIST_FUNCTION - calculates distance between two vectors.

This type of function calculates the distance between to vectors (data_entrys). Masked components are igonored in the calculation. An example of this kind of function is vector_dist_euc which computes the euclidean distance between two entrys. The type definition for this function is:

  typedef float DIST_FUNCTION(struct data_entry *v1, struct data_entry *v2, int dim);

where v1 and v2 are two entries and dim is their dimension. Note that the functions for finding winners do not use these functions.

NEIGH_ADAPT - Adapt neighborhood

The type definition for this function is:

  typedef void NEIGH_ADAPT(struct teach_params *teach,
                           struct data_entry *sample,
                           int bx, int by,
                           float radius, float alpha);

where the teaching parametes are in teach, sample is the data sample, (bx,by) are the coordinates of the winning unit, and radius and alpha are the training radius and alpha for this sample (both decrease during teaching).

This function takes a data sample and moves vectors within the adaptation region (a circle centered at (bx,by) with radius radius) towards the sample. The vector adaptation function and map distance function are taken from the teach-structure.

Examples of this kind of functions are bubble_adapt and gaussian_adapt (in som_rout.c) that adapt codebooks with bubble and gaussian neighborhood respectively.

VECTOR_ADAPT - adapt one vector.

This function takes a codebook vector and a sample vector and moves the codebook vector towards the sample by a small amount (indicated by parametes apha). The type definition for this function is:

  typedef void VECTOR_ADAPT(struct data_entry *c, struct data_entry *s, 
                            int d, float a);

where c is the codebook vector and s is the sample vector. D is the dimension and a is the alpha. The adapt_vector function is an example of this type of function. This function is used in the neighbourhood adaptation functions and in the LVQ training algorithms.

WINNER_FUNCTION - Find best matching units.

This function takes a codebook and a data sample and finds the codebook units that best match the data sample. The type definition for this function is:

  typedef int WINNER_FUNCTION(struct entries *codes, struct data_entry *sample,
                              struct winner_info *w, int knn);

where codes is the codebook and sample is the sample vector. The information about the winners is stores in the winner_info array pointed by w with the best match in w[0]. Knn is the number of best matching units to store (k nearest neighbours). The SOM_PAK uses only the knn of 1, in LVQ_PAK the number can vary.

Examples of these kind of functions are find_winner_euc and find_winner_knn which both use euclidean distance to look for best matches. The difference between these two functions is that find_winner_euc only finds the best matching unit (ie. knn = 1).

The winner_info structure is defined as follows:

  struct winner_info {
    long index;
    struct data_entry *winner;
    float diff;
  };

The index is the number of the winning unit from the start of the codebook. This is used to get the winning unit's coordinates on the map. Winner is a pointer to the winning entry. Diff is the difference of the winning vector and the sample. In this case it is the distance of the two vectors squared.

ALPHA_FUNC - get teaching alpha.

This function returns the alpha value during teaching. The type definition for this function is:

  typedef float ALPHA_FUNC(long iter, long length, float alpha);

where iter is the number of the current iteration, length is the total length of the process and alpha is the initial alpha value. Examples: linear_alpha which produces a linearly decreasing alpha and inverse_t_alpha which decreases according to the inverse of the iterations done.

Examples

Here are some hypothetical examples on customizing the package:

Using a different distance measure

Write a routine of type DIST_FUNCTION to calculate the distance.
Change the programs to set the dist-field in the teach_params to point to your function before calling the routine (som or lvq training, labeling, etc) but after the possible set_teach_params() function call. Note that some functions take a DIST_FUNCTION directly and don't use params-structure.
Now the program should be using your distance function.

It is likely that you don't get away with changing the distance function only, mainly because many operations use the find_winner functions which do not use the distance function. The following example will address this issue.

Teaching SOM with dot-product rule

In this example we want to change the SOM to use dot-product rule instead of the default. You need to write some new functions:

Write a distance function that uses dot-product in the way described in the previous example. (This may not actually be needed at all)
Write a new vector adaptation routine that uses dot product rule.
Write a new winner function that uses the dot rule (maximizes product instead of minimizes difference).

Then install these routines in the teach_params structure as done in the previous example.