SNOW - Additional information

The information present in the SNOW database has been obtained from:

LIGAND database is the source of the most part of biochemical compounds. Generic compounds (i.e. those whose formula end with an -R) were not considered as were not considered either those compounds with neither FORMULA nor DBLINKS fields.
UM-BBD database entries were cross-checked against LIGAND entries. UM-BBD entries were added as entries or synonyms.
NIST chemical webbook was the source of the information about isomers.

An alphabetical list of all the compound names and some statistics can be found here.

The main page contains several fields:

Query: here is where the query string (a compound name or a part of it) must be typed or pasted. Neither boolean operators nor wildcards (*) allowed.

Results containing the string/s: in these fields the user can write three different strings (one string into each field) to refine the seach for similar compound names. SNOW will only search among those database entries containing AT LEAST one of the strings. Neither boolean operators nor wildcards (*) allowed. Actually, the three fields are processed in an OR basis.

Results not containing the string/s: in these fields the user can write three different strings (one string into each field) to refine the seach for similar compound names. SNOW will only search among those database entries not containing ANY of the strings. Neither boolean operators nor wildcards (*) allowed. Actually, the three fields are processed in an AND NOT basis.

Stringency of the search: the stringency in SNOW is a measure for the gap penalty (see below) and will affect only to the process of finding similar results. HIGH stringency means that a higher gap penalty will be applied when building the alignment of strings and results will be concentrated into a narrower fork of higher scores. A LOW stringency will allow for worse alignments to be displayed (i.e. the score fork will be broader).

Number of results: max number of results displayed (only for similarities, not for exact matches).

Similarity cutoff: similar (not exact) matches are scored and ranked into three cathegories (HIGH similarity, MEDIUM similarity and LOW similarity) according to their similarity to the user's query. The user can choose which rank/s to be displayed. The number of displayed results will be, however, limited by the number of results chosen be the user (this means that, even if the user checks the HIGH and MEDIUM similarities button, the user may will be shown only HIGH similarity matches if there are more than HIGH similarity matches than the max number of results chosen).

The SNOW program is written in Perl and C and queries a MySQL database. Two different searches are executed by the program:

Search for exact matches: if an exact match for the query is found in the database, the recommended name and all its trivial names are displayed.

Search for similar matches: SNOW will also search the database for matches similar to the query string. The process applies a Smith-Waterman algorithm for aligning compounds names as strings. If the user submits strings in order to refine the search results, boolean operators are applied as explained above. Exact matches are excluded from additional searches.

All the database entries will be scored according to their similarity with the query string and will be displayed according to the user's choice. SNOW displays the results in three separated sections:

General information: information on the user's choices, date and database used.

Exact matches: each match is displayed together with its CAS number (whenever possible) and a comment on recommended names or trivial names. A link to a database containing more detailed information on the compound is displayed.
Similar matches: each match is displayed together with its CAS number (whenever possible) and matches can automatically be re-submitted.