Names and Words Tables
The Names and Words tables (NAMES.DAT & NAMES2.DAT) control:
- the matching equivalent of words e.g. Tony = Anthony
- the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
- casing rules e.g. PO Box, IBM, 360Science
- expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
- attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.
These are fixed-width text files. The layout of the NAMES.DAT & NAMES2.DAT files is as follows:
Property | Width | Description |
TYPE | 1 | Type of entry – see below |
NAME | 25 | Matching equivalent of the entry (e.g. 'Anthony' has a matching equivalent of 'Tony', enabling these two names to be matched) |
EQUIVALENT | 10 | The word which is actually looked up |
GENDER | 1 | Indicates the gender of the forename or prefix |
SALUTATION | 2 | Indicates the type of salutation to be generated for a particular prefix – see below |
PROPER CASE | 30 | Proper case value for the entry |
SWITCH | 1 | Indicates whether this entry is the first part of a two word lookup |
Note: matchIT will only look up the word in the Equivalent column, not the Name column. This means that all names must have an entry with name equal to Equivalent.
The different types that can be entered in the table are as follows:
Type | Description |
'A' | Address Word, such as "Rd" or "Street" |
'B' | Business word, such as "Ltd" or "Printers" |
'C' | UK county, such as "Kent" or "Glos" |
'E' | Exclusion word, such as "Deceased" or "Moved" |
'F' | Female forename (note the gender has to be set for these entries too) |
'I' | Initials, such as "E" or "W"; these entries are in the table as they may be the first part of a two word phrase, such as "E Midlands" |
'J' | Job title word, such as "Manager" |
'M' | Male forename (note the gender has to be set for these entries too) |
'N' | Noise word (i.e. ignored when generating keys or address matching), such as “The” or “House” |
'O' | Overseas i.e. foreign country |
'L' | Local country, such as "UK" or "Scotland"; this enables local countries to be identified as countries, without the record being marked as foreign |
'P' | Prefix, such as "Mr" or "Captain" (note the gender has to be set for these entries too, also the SALUTATION TYPE – see below) |
'Q' | Qualification word, such as "PhD" or "ARICS"; these entries typically always need a proper case entry as casing of qualifications can be unusual |
'S' | Special casing word, i.e. a word that is cased unusually but doesn't fall into any of the above categories, such as "PhotoMe" |
'T' | State or province, such as "Pennsylvania" or "PA" |
'U' | Unknown word; this is for the first word of a two word phrase, which, on their own, have no special meaning, such as the "Hong" in "Hong Kong" |
Each prefix entry must have a salutation type associated with it. The following list shows the salutation types, along with an example of the type of salutation that will be generated:
Type | Rule | Example |
S | Dear Prefix Surname | Dear Mr Smith |
C | Dear Prefix Surname | Dear Mr Smith |
FS | Dear Prefix Forename Surname | Dear Mr John Smith |
FF | Dear Forename | Dear John |
F | Dear Prefix Forename | Dear Sir John |
B | Dear Prefix | Dear Sir |
T | Prefix | My Lord |
Salutation type C is different from type S in that it is treated as a name even if it is found in address lines 1 or 2 with Scan Address Lines for Names set. This means that if the option is switched on and e.g. MR has salutation type C, then Mr J Smith would be identified as a name in address line 1 or 2, whereas if MR has salutation type S, then it would not be identified.
Additionally, each prefix, male forename and female forename must have a gender associated with it, taking a value of either ‘M’ (Male), ‘F’ (Female), or ‘E’ (Either).
These tables are stored in a fixed width format that can be edited via any text editor (including, for example, Notepad, Notepad++, and Programmer’s Notepad).
Note: if you inadvertently change the record length or field positions of these files, it may cause a failure in the matchIT API. You may find it useful to ensure that your text editor is set to display whitespace characters when editing these files.
Surnames and Towns Tables
Surnames Table
SURNAMES.DAT - used for casing surname prefixes such as "de" in Charles de Gaulle. To add or modify entries, follow the layout of the existing entries.
Towns Table
TOWNS.DAT - used for extracting towns from address lines to a specific Town field, also for upper casing Towns. This file is available for UK "post towns" only i.e. defined as such by Royal Mail.