H2.7 - Advanced Configuration Guide - Word Lookup Tables – Software Support

matchIT Hub Index

IMPORTANT: If you make changes to the file structure below, the file will not receive changes to names and words in the future updates. It is your sole responsibility to migrate in additions or changes to future versions of this file.

The word lookup tables are used by the core matching component. These tables control:

the matching equivalent of words e.g. Tony = Anthony
the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
casing rules e.g. PO Box, IBM, 360Science
expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.

Usage

Location: <settings><advanced>

<advanced>
<datPath>C:\Program Files\matchIT Hub\datfiles\US</datPath>
</advanced>

Prior to version 3.1

datPath: Specifies the path to the folder containing the word lookup files to use.

This folder contains: NAMES.DAT, NAMES2.DAT, SURNAMES.DAT and TOWNS.DAT.

Read more details on the names and words files here.

From version 3.1

datPath: Specifies the path to custom word lookup files used to modify the default base data. This method is preferred for clients wanting to keep track of what they changed and to get any changes Syniti may have included with updates.

This folder may contain:

address-custom.xml
name-custom.xml
organization-custom.xml
misc-custom.xml

Alternatively, you can specify the path to the folder containing a copy of the default base word lookup files used to replace the default base data. If you do not want to be impacted by updates to the names and words, we suggest making a copy of the files included with the install into another folder and then editing the copy and pointing to that folder in your config. This ensures your changes will not be lost if you reinstall or update. However, if you do not use the latest versions of these files supplied with new releases of the software, some aspects of the software may not function correctly and new features may not be available.

This folder may contain:

address.xml
name.xml
organization.xml
misc.xml

Base files and custom files can be present in the same folder. This results in base files loading first and custom files being used for additions and/or deletions from the base file. A copy of the base files are included with the install, and a copy of the custom files are attached to this article.

These files contain region-specific blocks of words, e.g. address-custom.xml has:

Country-agnostic address words.
Countries and country codes.
UK-specific address words, post towns, and counties.
US-specific address words and states.
Canadian-specific address words and provinces and territories.
Australian-specific address words and states.

Words are grouped in categories ("street", "forename", etc). The casing of words is ignored for matching, but the case specified is how the word is output. Multiple words in any category can be grouped together by enclosing them in <group>...</group>. The words in a group will be considered identical (for example, Anthony and Tony, Ltd and Limited, Apartment and Apt).

address.xml

<address>

<common>
<streets> 
<group>
<street>Road</street> 
<street>Rd</street>
</group>
<street>Rue</street>
...
</streets>
...
</common>

<country value="US">
...
</country>
...
</address>

address.xml contains the following categories:

Category	Collection	Description
street	streets	Road name designator, such as "Rd" or "Street"
numericStreet	numericStreets	Numeric street designator, such as "Rte" or "Highway"
building	buildings	Building name designator, such as "Farmhouse" or "Hall"
flat	flats	Secondary/sub-premise designator, such as "Suite" or "Apartment"
premise	premises	Numeric building designator, such as "Block", or "Building"
floor	floors	Floor designator, such as "Floor" or "Flr"
box	boxes	Box designator, such as "PO Box" or "Postfach"
direction	directions	Street directional, such as "North" or "SW"
country	countries	The country name and variants, such as "United States" or "Afghanistan"
code	countries	The country code, such as "US" or "USA"
region	regions	Administrative region, e.g. UK counties such as "Kent" or "Glos", US states such as "Iowa" or "NC", CA province etc
posttown	posttowns	UK posttowns
town	towns	Towns and cities other than posttowns
localCountries	localCountry	Local country, such as "Scotland" and "Wales"

name.xml

<name>

<common>
<male>
<prefixes> 
<group>
<prefix salutation="S">Mr</prefix> 
<prefix salutation="S">Mister</prefix>
</group>
<prefix>Master</prefix>
...
</prefixes>
...
</male>
<female>
...
</female>
<either>
...
</either>
...
</common>

<country value="US">
...
</country>
...
</name>

name.xml contains the following categories:

Category	Collection	Description
prefix	prefixes	Prefix, such as "Mr" or "Captain"
firstName	firstNames	Forename, such as "Adam" or "Abigail"
surname	surnames	Surname, such as "Smith" or "Jones"
suffix	suffixes	Suffix, such as "Jr" or "Senior"
Qualification	Qualifications	Qualification word, such as "PhD" or "ARICS"

Each prefix entry must have a salutation type associated with it. The following list shows the salutation types, along with an example of the type of salutation that will be generated:

Type	Rule	Example
S	Dear Prefix Surname	Dear Mr Smith
C	Dear Prefix Surname	Dear Mr Smith
FS	Dear Prefix Forename Surname	Dear Mr John Smith
FF	Dear Forename	Dear John
F	Dear Prefix Forename	Dear Sir John
B	Dear Prefix	Dear Sir
T	Prefix	My Lord

Salutation type C is different from type S in that it is treated as a name even if it is found in address lines 1 or 2 with Scan Address Lines for Names set. This means that if the option is switched on and e.g. MR has salutation type C, then Mr J Smith would be identified as a name in address line 1 or 2, whereas if MR has salutation type S, then it would not be identified.

organization.xml

<organization>

<common>
<types> 
<group>
<type>Ltd</type> 
<type to="Ltd">Limited</type>
</group>
<type>Holdings</type>
...
</types>
...
</common>

<country value="US">
...
</country>
...
</organization>

organization.xml contains the following categories:

Category	Collection	Description
word	words	Words indicative of a business name, such as "Printers" or "Antiquites"
type	types	Business type, such as "Ltd" or "GmbH"
name	names	Business name, such as "General Motors" or "Fedex"
job	jobs	Job title word, such as "Manager"

misc.xml

<misc>

<common>
<exclusions> 
<group>
<exclusion>Deceased</exclusion> 
<exclusion>Decsd</exclusion>
</group>
<exclusion>Addressee</exclusion>
...
</exclusions>
...
</common>

<country value="US">
...
</country>
...
</misc>

misc.xml contains the following categories:

Type	Collection	Description
exclusion	exclusions	Exclusion word, such as "Deceased" or "Moved"
noise	noises	Noise word (i.e. ignored when generating keys or address matching), such as “The” or “House”
special	specials	Special casing word, i.e. a word that is cased unusually but doesn't fall into any of the above categories, such as "PhotoMe"

If you are editing the custom XML's, provide an action such as adding or deleting.

Attributes

The attributes "action" and "match" indicate how the customization modify the built in data. Both small- and large-scale customizations can be made – e.g. individual entries can be added or entire country-specific blocks disabled.

Attribute	Description
action="modify"	can be applied to any node; always implied if not specified; modify the original node.
action="replace"	can be applied to any node; delete the original node and its children, and replace with customized data.
match="true"	always implied if not specified.
match="false"	can only be applied to a group of two words; prevent the grouped words from matching by deleting them from any group in which the two words appear together.
match="nothing"	can only be applied to a word that isn't in a group; prevent the word from matching anything.
match="delete"	can be applied to any node; delete from the base data.

examples

To prevent "Andy" from being considered a variant of "Andrew":

<name>
<common>
<male>
<firstNames>
<group match="false">
<name>Andrew</name>
<name>Andy</name>
</group>
...

To prevent "Andy" from being considered a variant of any male name:

<name>
<common>
<male>
<firstNames>
<name match="nothing">Andy</name>
...

To delete all either gender qualifications:

<name>
<common>
<either>
<qualifications match="delete" />
...

To replace the default list of company names with one of your own:

<organization>
<common>
<names action="replace">
<name>Syniti</name>
<name>helpIT</name>
<name>360Science</name>
...
</names>