The following settings are used when records are parsed and keys are generated. The default value for each setting is shown.
Location: <settings><advanced><generate>
<generate>
<name>...</name>
<organization>...</organization>
<address>...</address>
<quality>...</quality>
...
</generate>
Note that the quality settings are relevant only to the normalization processing mode.
Name
Location: <settings><advanced><generate><name>
<name>
<defaultGender>unknown</defaultGender>
<useEquivalentName>true</useEquivalentName>
<enhancedDoubleBarrelledLookup>false</enhancedDoubleBarrelledLookup>
<processBlankLastName>false</processBlankLastName>
<parseNameElements>true</parseNameElements>
<detectInverseNames>false</detectInverseNames>
<parseAsNormalizedName>false</parseAsNormalizedName>
<--the following are for normalization only:-->
<joinMarriedPrefixes>true</joinMarriedPrefixes>
<generateContact>true</generateContact>
<contactFullname>false</contactFullname>
<defaultSalutation>Dear Customer</defaultSalutation>
<extract>
<middleNames>leave</middleNames>
</extract>
</name>
defaultGender: The Default Gender property is the gender to assume when the Match API can't determine whether the name is male or female e.g. Chris Smith, C Smith. If you set this property to Male or Female, the API will assume it to be male or female accordingly, and develop a salutation using Mr or Ms as the prefix.
useEquivalentName: If this is enabled, the Match API replaces the first name with its equivalent from the NAMES.DAT file, if there is an entry for the input first name. This enables, for example, "Tony Smith" and "Anthony Smith" to be picked up as a match. The initial of the original first name is stored in the Record.DataFlags setting to enable, for example, "Tony Smith" and "T Smith" to still be matched.
enhancedDoubleBarrelledLookup: When enabled, this setting will cause an unrecognised middle name to be considered part of a non-hyphenated double-barrelled last name (for example, where the full name is John Harrington Jones, the last name will be considered Harrington-Jones because Harrington is not a recognised first name).
processBlankLastName: With this setting enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors. For example, if a firstname was entered but not a lastname, then it'll be assumed that the firstname is in fact the lastname and match keys will be generated rather than being left blank.
This setting is disabled by default.
parseNameElements: When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed. If the Match API deems any values to have been entered into an incorrect field (for example, suffixes and qualifications in the lastname field), it will reassign these values into the correct fields.
This setting is disabled by default, so that any such incorrect values are not reassigned.
detectInverseNames: With this setting enabled, the Match API will attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, "Smith, John" where Smith is the lastname). Without a comma, a name is assumed to be in standard left-to-right format, with the firstnames preceding the lastname.
This setting is disabled by default.
parseAsNormalizedName: When enabled, addressee names are assumed to be in a delimited normalized format similar to the Record.MatchingFields.NormalizedName value that's output by Engine.Generate( ). Currently supported delimiters are spaces, commas, semicolons, and pipes ('|').
This setting is disabled by default.
Normalization
The following settings are relevant only to the normalization processing mode:
joinMarriedPrefixes: With this property set to True, multiple addressees with the same last name will be treated as married e.g. input names of "Mr. John Smith and Ms. Mary Smith" or "Mr John Smith & Mary Smith" would have a Salutation generated of "Mr and Mrs Smith" and a Contact generated of "Mr and Mrs J Smith" or "Mr and Mrs John Smith".
generateContact: With this property set to True, the Match API will generate a contact for the input name. The contact will be structured in the same way as you would expect to find its corresponding input name on e.g. the front of an envelope. For example, the input name of "John Smith" or "Mr Smith" would result in a generated contact of "Mr J Smith".
An accurate contact value cannot be generated when the Match API is unable to determine the gender of an input name. In this situation, the generated contact would be equal to the input name. e.g. "J Smith" as an input name would result in a generated contact of "J Smith".
contactFullname: Set this property to True to include the full first name of any incoming name in the CONTACT field; just the initial will be used if the property is False. For example, if the property is True, and the incoming name is "John Smith", then the generated contact will be "Mr John Smith", if it is False, then the contact will be "Mr J Smith".
defaultSalutation: This property determines the default salutation, either where the API can't determine one (for example, C Smith or Chris Smith, which could be either Mr or Ms), or where the Prefix supplied doesn't have a salutation rule. If you include the word "Dear" as at the start of the default salutation (i.e. actually specify "Dear Customer" and not just "Customer", then all the salutations derived by the Match API will start with the word "Dear" unless the salutation for the type of title (or prefix) specifies "Title" only. For example, Mr J Smith will result in a salutation of "Dear Mr Smith" whereas The Bishop of Liverpool will result in a salutation of "My Lord".
extract/middleNames: When enabled, any middle names are extracted to the MiddleNames field.
Location: <settings><advanced><generate><organization>
<organization>
<useEquivalentName>false</useEquivalentName>
<normalizationTruncation>0</normalizationTruncation>
<ignoreParentheses>false</ignoreParentheses>
<ignoreTrailingPostTown>false</ignoreTrailingPostTown>
<--the following are for normalization only:-->
<extract>
<jobTitle>leave</jobTitle>
<name>leave</name>
</extract>
<joinInitials>true</joinInitials>
</organization>
useEquivalentName: If this property is set to True, then the equivalent (according to the NAMES.DAT file) of words indicating a business name, such as "Motors" or "Services" are included in the Record.NormalizedOrganization property and the corresponding phonetic keys. This enables, for example, "Wood Green Cars" to match "Wood Green Motors" well (because "Cars" has an equivalent of "Motors"), but ensures that neither of them match "Wood Green Carpets" well.
If you set this property to True, you should change any words in the NAMES.DAT file that you do want ignored, such as "Ltd" and "Inc" to Noise type so that they are not included in the Record.NormalizedOrganization property. As a rule of thumb, if you are doing business matching on a file that is very geographically concentrated, that is, contains records mostly from the same immediate area, then set the Settings.Generate.Organization.UseEquivalentName property to True, otherwise set it to False.
If UseEquivalentName is false, then ordinarily we class all business words as noise words before populating the structure for normalizedOrganzation. Then, they are treated exactly as noise words from names.dat are from that point on. If every word is a business word, then we go back to the first word (from left to right) which wasn’t a noise word before this step and change it to type ‘c’. Type ‘c’ so that it’s no longer a noise word (type ‘n’). Then processing continues, meaning the word that was changed to type ‘c’ will not be stripped out.
normalizationTruncation: Disabled by default (i.e. set to 0) If this setting is enabled, and the organization consists of more than three words, then the third element of Record.MatchingFields.NormalizedOrganization will be truncated to the first N characters of each word after the first two (where N is the value of this setting).
legalNormalization: (From Match API 3.1) Enabled by default (i.e. set to true), and ignored if UseEquivalentName is false. If this property is set to True (as well as UseEquivalentName), then the business type words such as "Ltd" and "Inc" (in addition to business words such as "Motors" or "Services") are included in the Record.NormalizedOrganization property and the corresponding phonetic keys.
ignoreParentheses: With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the phonetic organization keys. This can be useful for records such as Remnel Ltd and Remnel (UK) Ltd, to ensure records with these company names are compared if the phonetic organization keys are being used as part of composite match keys.
ignoreTrailingPostTown: This property, when enabled, will exclude from the phonetic organization keys any trailing post town (defined in towns.dat, see Towns Table) or UK county that appears at the end of a company name. For example, the phonetic organization keys for Handso Ltd and Handso Essex Ltd will be the same to help ensure such records will be compared.
ignoreTrailingPostTown: This property, when enabled, will exclude from the phonetic organization keys any trailing post town (defined in towns.dat, see Towns Table) or UK county that appears at the end of a company name. For example, the phonetic organization keys for Handso Ltd and Handso Essex Ltd will be the same to help ensure such records will be compared.
Normalization
The following settings are relevant only to the normalization processing mode:
extract/jobTitle: This will copy or extract job titles contained within the ADDRESSEE field into a field labeled JOB_TITLE.
Job Titles are recognized by having a word or string defined as a Job Title in the NAMES.DAT file e.g. Director.
extract/name: This will copy or extract any business names contained within the ADDRESSEE field into a field labeled COMPANY.
Business names are recognized by having a word or string defined as a Business word in the NAMES.DAT file e.g. Ltd. Care should be taken when using this setting, as words like "Bank" can be taken to indicate a Business when this isn't the case (e.g. it may be a last name or part of an address line).
If you want Extract Company Name processing to be applied also to the first one or two lines of the address, you must Set the property Settings.Generate.Address.NumOfLinesToScan to either 1 or 2.
joinInitials: Set this property to True if you want a group of initials separated by spaces or dots in a company name to be concatenated. For example, if this property is True, then "I B M" and "I.B.M." will be replaced by "IBM". Note that, if the Settings.Generate.ProperCase property is set to False, then this property will have no effect.
Location: <settings><advanced><generate><address>
<address>
<verifyPostcode>true</verifyPostcode>
<defaultThoroughfareLine>1</defaultThoroughfareLine>
<numOfLinesToScan>0</numOfLinesToScan>
<premiseFirst>false</premiseFirst>
<--the following are for normalization only:-->
<extract>
<premise>copy</premise>
<thoroughfare>copy</thoroughfare>
<town>move</town>
<postTownsOnly>false</postTownsOnly>
<region>move</region>
<postcode>move</postcode>
<country>move</country>
</extract>
<abbreviateRegion>false</abbreviateRegion>
<upperCaseTown>false</upperCaseTown>
</address>
verifyPostcode: If set to True, this property verifies and corrects the format of the postcode. Numerics are changed to alphas and vice versa where appropriate. This feature makes use of the rules concerning the alphanumeric structure of the postcode. e.g. it changes "KT22 BDN" to "KT22 8DN" - it will change 0, 1, 5 and 8 to O, I, S and B, or vice versa, if that makes the postcode alphanumerically correct. The Match API will not verify or correct the format of postcodes that are not in the postcode field.
defaultThoroughfareLine: This property is used when the Match API is generating a phonetic address key, for which it needs to know the thoroughfare (e.g. street) and the town in the address. If it cannot locate a thoroughfare in the address, usually because it cannot find a word to indicate one, such as "Street", then the API will assume that the thoroughfare is the contents of the address line indicated by this property (if it is greater than zero). For example, if this property is set to 2, then the API will take the contents of address line 2 as the thoroughfare if it cannot find a thoroughfare word in the address. This property should only be used if the addresses in your data are very rigidly structured.
numOfLinesToScan: This property enables personal names to be extracted from address lines. It can be set to 1 or 2. If set to 1, only the first address line will be scanned for names. If set to 2, both the first and second address lines will be scanned and have names extracted from them if found. Any personal names found can then be used for the generation of Contacts and Salutations.
If either or both of the Settings.Generate.Organization.Extract.Jobtitle and Settings.Generate.Organization.ExtractName properties are used in conjunction with this one, the maAPI will not only scan the ADDRESSEE field for job titles and business names, but will also scan the corresponding number of address lines.
premiseFirst: When parsing an address, this Boolean property indicates whether to expect the premise or flat number to come first in address lines when the flat is not explicitly specified (e.g. "Flat 5"). This should be set to True when the nationality is set to 'USA', or False otherwise.
detectCountry: When parsing an address, this property indicates whether or not and how to check which country the address refers to. Options are:
- none - no country detection performed - always assumes default country.
- basic (default)- just checks the Postcode & Country fields and last line of address, looking for either a full postcode or a full country or region name that fills the entire field.
- full - searches all address lines and computes a score based on country specific words found (this is time consuming).
Normalization
The following settings are relevant only to the normalization processing mode:
extract/premise: This will move or copy premise numbers found in the address lines into a field labelled PREMISE.
It is not advisable to Extract the premise if you want to output the updated address later, as the API will not know which address line the premise number came from. It can however be useful to copy premise numbers, as this enables the inclusion of the PREMISE field as part of the match keys used during the compare stage, which can increase efficiency and accuracy when working on large files, or files containing very localized data.
extract/thoroughfare: This will move or copy address data recognized as the thoroughfare of the address (based on Address type entries found in the NAMES.DAT file) into a field labelled THOROUGHFARE.
extract/town: This will move or copy address data recognized as the town or city from the address lines to a field labelled TOWN.
extract/postTownsOnly: If this is enabled, together with Settings.Generate.Address.Extract.Town, then only post towns (i.e. any towns found in the TOWNS.DAT file) will be moved or copied.
extract/region: This will move or copy US, Canadian or Australian states or provinces, or valid UK counties (or other regions found in the NAMES.DAT file), that are found in the address lines into a field labelled REGION.
extract/postcode: This will move or copy UK postcodes, or US zip codes found in the address lines into a field labelled POSTCODE.
Only UK postcodes with an outward half that is valid according to the MAILSORT.DAT file will be extracted. It is advisable to set this property to "MoveExtract" rather than "CopyExtract" if using this data for mailing or updating a database.
extract/country: This will move or copy valid countries found in the address lines (based on Country type entries found in the NAMES.DAT file) into a field labeled 'COUNTRY'.
abbreviateRegion: Set this property to True if you want the Match API to abbreviate States or Provinces when processing address lines e.g. to change "Pennsylvania" to "PA".
upperCaseTown: This applies to UK addresses only. Set this property to True to convert the post town in the address to capitals. Note that, if the Settings.Generate.ProperCase property is set to False, then this property is ignored.
Location: <settings><advanced><generate><quality>
<--the following are for normalization only:-->
<quality>
<enabled>false</enabled>
<address>
<allowBlankPostcode>true</allowBlankPostcode>
</address>
<email>
<webmailFiltering>true</webmailFiltering>
</email>
</quality>
Normalization
Note that the quality settings are relevant only to the normalization processing mode:
enabled: By default, quality scoring is disabled and all quality scores are 0.
address/allowBlankPostcode: If disabled (enabled by default) then addresses without a postal code are restricted to a maximum quality score of 1.
email/webmailFiltering: If enabled (default) then email addresses that use webmail provider domains (such as Hotmail, Yahoo, and mail.com) are restricted to a maximum quality score of 7.
Location: <settings><advanced><generate>
<dropExcludedWords>true</dropExcludedWords>
<considerCasing>true</considerCasing>
<variableKeysMaxLength>8</variableKeysMaxLength>
<--the following are for normalization only:-->
<properCase>true</properCase>
<specialCaseMac>true</specialCaseMac>
dropExcludedWords: With this property set to True, during the generate step the Match API will flag any records that contain exclusion words in any of the key fields (fields such as addressee, company or the address lines). Such exclusion words include "Deceased", "Addressee" (indicating a record may be a header record) and any other Exclusion type entries in the NAMES.DAT file. Records are flagged by setting the first character of the Record.DataFlags property to "X".
considerCasing: If this property is set to True, then the Match API will consider the casing of the incoming data when it is splitting the data up for extracting keys, proper casing, and so forth. This is mainly used for the extraction of name data.
variableKeysMaxLength: This specifies the maximum length of various variable-length phonetic keys created when a record is generated. Such keys are PhoneticLastName, PhoneticFirstName, PhoneticMiddleName, PhoneticOrganizationName1, PhoneticOrganizationName2, PhoneticOrganizationName3, PhoneticStreet, and PhoneticTown. The default is 8 characters.
Normalization
The following settings are relevant only to the normalization processing mode:
properCase: If this property is set to True, the Generate step will convert the address lines in your records (labeled ADDRESS1, ADDRESS2... ADDRESSn) to their proper case. This proper casing will handle punctuation, apostrophes and abbreviations. It will also convert ADDRESSEE, JOB_TITLE, DEPARTMENT, and COMPANY to the correct case. Exceptions to the default casing rules are held in the NAMES.DAT file.
The API's default rules for casing data are as follows: letters following an apostrophe are capitalized (e.g. "Mr O'Reilly"), as are letters following "Mc" or (subject to one of the Advanced Input Options) "Mac" at the start of a name (see "Mac Name Treatment"). Double-barreled names have a capital letter after the hyphen. If the name or other word has a proper casing entry in the NAMES.DAT file, it is cased as shown there e.g. BSc, One2One, IBM, plc. If not in the NAMES.DAT file, words are all capitals if they contain no vowels, otherwise they are changed to initial capital followed by lower case letters.
specialCaseMac: Where a last name begins with Mac, when formatting salutations, the Match API follows this with a small letter or a capital letter, depending on this property. A value of True will mean that MACLEAN will be formatted as MacLean. You can add exceptions to the rule (e.g. Maccabee, Macclesfield, MacKay, Mackie) to the NAMES.DAT file. If you invariably want to use a lower case letter following Mac, set this property to False.
NB: Names beginning Mach are always formatted with a lower case H, e.g. Machin, Machinery. Names beginning Mc are formatted with a capital letter following, if they are greater than 3 characters long.
Syniti Match API Word Lookup Configuration
The word lookup tables are used by the Match API. Modifying the word lookup tables is considered advanced usage and is not normally recommended.
Note that Syniti Match API is built atop the Match API, the component that provides Syniti's core technology.
Location: <settings><advanced>
<advanced>
<datPath>C:\Program Files\matchIT Hub\datfiles\US</datPath>
</advanced>
Prior to version 3.1
datPath: Specifies the path to the folder containing the word lookup files to use.
This folder contains: NAMES.DAT, NAMES2.DAT, SURNAMES.DAT and TOWNS.DAT.
Read more details on the names and words files here.
From version 3.1
datPath: Specifies the path to the folder containing custom word lookup files used to modify the base data.
This folder may contain:
- address-custom.xml
- name-custom.xml
- organization-custom.xml
- misc-custom.xml
These files contain region-specific blocks of words, e.g. address-custom.xml has:
- Country-agnostic address words.
- Countries and country codes.
- UK-specific address words, post towns, and counties.
- US-specific address words and states.
- Canadian-specific address words and provinces and territories.
- Australian-specific address words and states.
Words are grouped in categories ("street", "forename", etc). The casing of words is ignored for matching, but the case specified is how the word is output. Multiple words in any category can be grouped together by enclosing them in <group>...</group>. The words in a group will be considered identical (for example, Anthony and Tony, Ltd and Limited, Apartment and Apt).
address-custom.xml
<address>
<!--country-agnostic words used by all countries-->
<common>
<streets> <!--collection-->
<group>
<street>Road</street> <!--category-->
<street>Rd</street>
</group>
<street>Rue</street>
...
</streets>
...
</common>
<!--specify countries using ISO 3166-1 alpha-2 codes-->
<country value="US">
...
</country>
...
</address>
address-custom.xml contains the following categories:
Catgory |
Collection |
Description |
street |
streets |
Road name designator, such as "Rd" or "Street" |
numericStreet |
numericStreets |
Numeric street designator, such as "Rte" or "Highway" |
building |
buildings |
Building name designator, such as "Farmhouse" or "Hall" |
flat |
flats |
Secondary/sub-premise designator, such as "Suite" or "Apartment" |
premise |
premises |
Numeric building designator, such as "Block", or "Building" |
floor |
floors |
Floor designator, such as "Floor" or "Flr" |
box |
boxes |
Box designator, such as "PO Box" or "Postfach" |
direction |
directions |
Street directional, such as "North" or "SW" |
country |
countries |
The country name and variants, such as "United States" or "Afghanistan" |
code |
countries |
The country code, such as "US" or "USA" |
region |
regions |
Administrative region, e.g. UK counties such as "Kent" or "Glos", US states such as "Iowa" or "NC", CA province etc |
posttown |
posttowns |
UK posttowns |
town |
towns |
Towns and cities other than posttowns |
localCountries |
localCountry |
Local country, such as "Scotland" and "Wales" |
name-custom.xml
<name>
<!--country-agnostic words used by all countries-->
<common>
<male>
<prefixes> <!--collection-->
<group>
<prefix salutation="S">Mr</prefix> <!--category-->
<prefix salutation="S">Mister</prefix>
</group>
<prefix>Master</prefix>
...
</prefixes>
...
</male>
<female>
...
</female>
<either>
...
</either>
...
</common>
<!--specify countries using ISO 3166-1 alpha-2 codes-->
<country value="US">
...
</country>
...
</name>
name-custom.xml contains the following categories:
Category |
Collection |
Description |
prefix |
prefixes |
Prefix, such as "Mr" or "Captain" |
firstName |
firstNames |
Forename, such as "Adam" or "Abigail" |
surname |
surnames |
Surname, such as "Smith" or "Jones" |
suffix |
suffixes |
Suffix, such as "Jr" or "Senior" |
Qualification |
Qualifications |
Qualification word, such as "PhD" or "ARICS" |
Each prefix entry must have a salutation type associated with it. The following list shows the salutation types, along with an example of the type of salutation that will be generated:
Type |
Rule |
Example |
S |
Dear Prefix Surname |
Dear Mr Smith |
C |
Dear Prefix Surname |
Dear Mr Smith |
FS |
Dear Prefix Forename Surname |
Dear Mr John Smith |
FF |
Dear Forename |
Dear John |
F |
Dear Prefix Forename |
Dear Sir John |
B |
Dear Prefix |
Dear Sir |
T |
Prefix |
My Lord |
Salutation type C is different from type S in that it is treated as a name even if it is found in address lines 1 or 2 with Scan Address Lines for Names set. This means that if the option is switched on and e.g. MR has salutation type C, then Mr J Smith would be identified as a name in address line 1 or 2, whereas if MR has salutation type S, then it would not be identified.
organization-custom.xml
<organization>
<!--country-agnostic words used by all countries-->
<common>
<types> <!--collection-->
<group>
<type>Ltd</type> <!--category-->
<type to="Ltd">Limited</type>
</group>
<type>Holdings</type>
...
</types>
...
</common>
<!--specify countries using ISO 3166-1 alpha-2 codes-->
<country value="US">
...
</country>
...
</organization>
organization-custom.xml contains the following categories:
Category |
Collection |
Description |
word |
words |
Words indicative of a business name, such as "Printers" or "Antiquites" |
type |
types |
Business type, such as "Ltd" or "GmbH" |
name |
names |
Business name, such as "General Motors" or "Fedex" |
job |
jobs |
Job title word, such as "Manager" |
misc-custom.xml
<misc>
<!--country-agnostic words used by all countries-->
<common>
<exclusions> <!--collection-->
<group>
<exclusion>Deceased</exclusion> <!--category-->
<exclusion>Decsd</exclusion>
</group>
<exclusion>Addressee</exclusion>
...
</exclusions>
...
</common>
<!--specify countries using ISO 3166-1 alpha-2 codes-->
<country value="US">
...
</country>
...
</misc>
misc-custom.xml contains the following categories:
Type |
Collection |
Description |
exclusion |
exclusions |
Exclusion word, such as "Deceased" or "Moved" |
noise |
noises |
Noise word (i.e. ignored when generating keys or address matching), such as “The” or “House” |
special |
specials |
Special casing word, i.e. a word that is cased unusually but doesn't fall into any of the above categories, such as "PhotoMe" |
Attributes
The attributes "action" and "match" indicate how the customization modify the built in data. Both small- and large-scale customizations can be made – e.g. individual entries can be added or entire country-specific blocks disabled.
Attribute |
Description |
action="modify" |
can be applied to any node; always implied if not specified; modify the original node. |
action="replace" |
can be applied to any node; delete the original node and its children, and replace with customized data. |
match="true" |
always implied if not specified. |
match="false" |
can only be applied to a group of two words; prevent the grouped words from matching by deleting them from any group in which the two words appear together. |
match="nothing" |
can only be applied to a word that isn't in a group; prevent the word from matching anything. |
match="delete" |
can be applied to any node; delete from the base data. |
examples
To prevent "Andy" from being considered a variant of "Andrew":
<name>
<common>
<male>
<firstNames>
<group match="false">
<name>Andrew</name>
<name>Andy</name>
</group>
...
To prevent "Andy" from being considered a variant of any male name:
<name>
<common>
<male>
<firstNames>
<name match="nothing">Andy</name>
...
To delete all either gender qualifications:
<name>
<common>
<either>
<qualifications match="delete" />
...
To replace the default list of company names with one of your own:
<organization>
<common>
<names action="replace">
<name>Syniti</name>
<name>helpIT</name>
<name>360Science</name>
...
</names>