Data file handling
Opening, loading and saving files
Loading and saving datafiles and codebooks is pretty easy if you don't
want to do anything fancy. All you need are these three functions:
open_entries to open a data or
codebook file,
close_entries to close a file you
have opened earlier and
save_entries to save a codebook
entries to a file. Another function for saving data is
save_entries_wcomments
that allows you to put your own headers or comments at the start of
the file after the default headers.
- struct entries *open_entries(char *name)
- Open a data file. Returns a pointer to a ready-to-use
entries-structure. After opening the file you can set
the buffering mode with
set_buffer or with
set_teach_params along
with other options. Note that no data is loaded from the file
at this point. The data is loaded when it it first requested,
usually by
rewind_entries.
Returns NULL on error.
- void close_entries(struct entries *entries)
- Deallocates an entries-file. Frees memory allocated for
entry-list and closes the file associated with entries if there
is one open.
- Saves data to a file with optional comment or header lines.
Codes is the data to save, out_code_file is the
file name and comments is a pointer to a string
containing the comments or NULL if no comments. Returns a
non-zero value on error.
- int save_entries(struct entries *codes, char *out_code_file)
- Saves data to a file. This is actually a macro that calls
save_entries_wcomments
with the comments as NULL.
- int label_not_needed(int level)
- Controls whether labels are needed for every data entry or
not. Setting level to nonzero means that labels are not
necessary. This function sets the default used by
open_entries.
The following functions are lower level functions that are used mainly
by the functions above and when going through a data file. In most
cases you don't have to any of these directly.
A couple of functions, however, can be quite useful:
write_header
to write a header of an entries-structure to a file and
write_entry
to write a single data_entry to a file. These can be quite useful when
you want to make a program that acts as a filter. See the source code
for the program visual for an example on how to use them.
- struct data_entry *load_entry(struct entries *entr, struct data_entry *entry)
- Loads one data_entry from file associated with entr. If
entry is non-NULL, the old data_entry is reused,
otherwise a new entry is allocated. Returns NULL on error.
- Writes the default header information to a file.
- int write_entry(struct file_info *fi, struct entries *entr, struct data_entry *entry)
- Writes one data entry (entry) to file fi from
entr.
- struct entries *open_data_file(char *name)
- Opens a data file for reading. Returns a pointer to
entries-structure or NULL on error. If name is NULL,
just allocates the data structure but doesn't open any file.
- Reads the header information from file and sets the entries
variables accordingly. Return a non-zero value on error.
- Skips over headers of a file. Used when a file is
re-opened. Returns a non-zero value on error.
- int rewind_datafile(struct file_info *fi)
- Go to the beginning of file to the point where the first data
entry is. You don't usually have to use this at all because
this is automatically done by
rewind_entries and
next_entry.
Returns 0 on success, error code otherwise.
- struct entries *read_entries(struct entries *entries)
- Reads data from file to memory. If LOADMODE_ALL is used the
whole file is loaded into memory at once and the file is
closed. If buffered loading (LOADMODE_BUFFER) is used at most N
data vectors are read into memory at one time. The buffer size
N is given in the entries->buffer field. In both cases
if there are any previous entries in the entries structure,
they are overwritten and the memory space allocated for them is
reused. Returns NULL on error, otherwise returns
entries.
Going through the data
A very common operation is going thru all the vectors in a data or
code file, possibly many times, like when teaching a map with
vsom. Two functions are provided to do this:
rewind_entries which gets you the
first vector from a datafile and
next_entry which gives the next
vectors. These functions handle buffered reading, compressed files,
etc. automatically without you having to worry about it at all.
A typical use could be line this:
struct data_entry *dtmp;
eptr p;
/* ... */
dtmp = rewind_entries(data, &p); /* go to start of data, initalize p */
while (dtmp != NULL) {
/* whatever needs to be done */
dtmp = next_entry(&p); /* get next entry */
}
Where data is an entries-structure containing your
data.
Eptr is used to store the current position in the
datafile and it is defined as:
struct entry_ptr {
struct data_entry *current;
struct entries *parent;
long index;
};
typedef struct entry_ptr eptr;
Current is a pointer to the current vector, parent is a
pointer to the entries with which this pointer is associated
and index is the number of the current entry in the datafile
(first is 0). You can have multiple eptrs for the same datafile
as long as you aren't using buffered input. Also, eptrs can be
copied from one eptr to another.
When the end of the datafile is reached, next_entry returns
NULL. If you want to read the data again, just call
rewind_entries again and then go on as before.
- struct data_entry *rewind_entries(struct entries *entries, eptr *ptr)
- Go to the first entry in entries list. Returns pointer to
first data_entry and initializes the entry pointer (ptr)
structure. The ptr pointer should be used with next_entry to get the following data
vectors from the datafile. Loads data from file if it hasn't
been loaded yet and rewinds file if we are using buffered reading.
- id="next_entry">struct data_entry *next_entry(eptr *ptr)
- Get next entry from the entries table. Ptr is the entry
pointer got from rewind_entries
earlier. Returns NULL when at end of table or end of file is
encountered. If loadmode is buffered, loads more data from file
when needed.
Manipulating data
These functions are used to allocate, free or copy individual
data_entrys or whole codebooks or files.
- struct entries *alloc_entries(void)
- Allocate and initialize an entries-structure. Returns NULL on
error.
- struct entries *copy_entries(struct entries *entr)
- create a new entries structure with the same parameters
as the original. Doesn't copy data_entrys.
- struct data_entry *init_entry(struct entries *entr, struct data_entry *entry)
- Initialize and possibly allocate room for data_entry that is to
be in entr. If entry is NULL, a new entry is
allocated and initialized. If entry is a pointer to an old
entry, the old entry is re-initialized. Return NULL on error.
- void free_entry(struct data_entry *entry)
- Deallocates a data_entry.
- void free_entrys(struct data_entry *data)
- Free a list of entries starting with data.
- struct data_entry *copy_entry(struct entries *entries, struct data_entry *data)
- Make a copy of data.
Other functions
- void set_buffer(struct entries *entries, long buffer)
- Sets the buffer size of an entries-file and selects
LOADMODE_BUFFER or turns on LOADMODE_ALL if buffer is
0.
- int set_teach_params(struct teach_params *params, struct entries *codes, struct entries *data, long dbuffer)
- Sets values in teaching parameter structure params based
on values given in codebook (codes) and data (data)
files. Dbuffer is the data file's buffer length or 0 if
no buffering is wanted. By default sets topology and map type
from codes and selects winner, distance and adaptation
functions that use euclidean distance. The
teach_params-structure is described in more detail in the
Customizing SOM/LVQ_PAK
-document.