H2.2 - Advanced Configuration Guide - Generate Settings – Software Support

matchIT Hub Index

The following settings are used when records are parsed and keys are generated. The default value for each setting is shown.

Location: <settings><advanced><generate>

<generate>
<name>...</name>
<organization>...</organization>
<address>...</address>
<quality>...</quality>
...
</generate>

Note that the quality settings are relevant only to the normalization processing mode.

Name

Location: <settings><advanced><generate><name>

<name>
<defaultGender>unknown</defaultGender>
<useEquivalentName>true</useEquivalentName>
<enhancedDoubleBarrelledLookup>false</enhancedDoubleBarrelledLookup>
<processBlankLastName>false</processBlankLastName>
<parseNameElements>true</parseNameElements>
<detectInverseNames>false</detectInverseNames>
<parseAsNormalizedName>false</parseAsNormalizedName>

<--the following are for normalization only:-->
<joinMarriedPrefixes>true</joinMarriedPrefixes>
<generateContact>true</generateContact>
<contactFullname>false</contactFullname>
<defaultSalutation>Dear Customer</defaultSalutation>
<extract>
<middleNames>leave</middleNames>
</extract>
</name>

defaultGender: The Default Gender property is the gender to assume when the matchIT API can't determine whether the name is male or female e.g. Chris Smith, C Smith. If you set this property to Male or Female, the API will assume it to be male or female accordingly, and develop a salutation using Mr or Ms as the prefix.

useEquivalentName: If this is enabled, the matchIT API replaces the first name with its equivalent from the NAMES.DAT file, if there is an entry for the input first name. This enables, for example, "Tony Smith" and "Anthony Smith" to be picked up as a match. The initial of the original first name is stored in the Record.DataFlags setting to enable, for example, "Tony Smith" and "T Smith" to still be matched.

enhancedDoubleBarrelledLookup: When enabled, this setting will cause an unrecognised middle name to be considered part of a non-hyphenated double-barrelled last name (for example, where the full name is John Harrington Jones, the last name will be considered Harrington-Jones because Harrington is not a recognised first name).

processBlankLastName: With this setting enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors. For example, if a firstname was entered but not a lastname, then it'll be assumed that the firstname is in fact the lastname and match keys will be generated rather than being left blank.

This setting is disabled by default.

parseNameElements: When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed. If the matchIT API deems any values to have been entered into an incorrect field (for example, suffixes and qualifications in the lastname field), it will reassign these values into the correct fields.

detectInverseNames: With this setting enabled, the matchIT API will attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, "Smith, John" where Smith is the lastname). Without a comma, a name is assumed to be in standard left-to-right format, with the firstnames preceding the lastname.

This setting is disabled by default.

parseAsNormalizedName: When enabled, addressee names are assumed to be in a delimited normalized format similar to the Record.MatchingFields.NormalizedName value that's output by Engine.Generate( ). Currently supported delimiters are spaces, commas, semicolons, and pipes ('|').

This setting is disabled by default.

Normalization

The following settings are relevant only to the normalization processing mode:

joinMarriedPrefixes: With this property set to True, multiple addressees with the same last name will be treated as married e.g. input names of "Mr. John Smith and Ms. Mary Smith" or "Mr John Smith & Mary Smith" would have a Salutation generated of "Mr and Mrs Smith" and a Contact generated of "Mr and Mrs J Smith" or "Mr and Mrs John Smith".

generateContact: With this property set to True, the matchIT API will generate a contact for the input name. The contact will be structured in same way as you would expect to find its corresponding input name on e.g. the front of an envelope. For example, the input name of "John Smith" or "Mr Smith" would result in a generated contact of "Mr J Smith".

An accurate contact value cannot be generated when the matchIT API is unable to determine the gender of a input name. In this situation, the generated contact would be equal to the input name. e.g. "J Smith" as an input name would result in a generated contact of "J Smith".

contactFullname: Set this property to True to include the full first name of any incoming name in the CONTACT field; just the initial will be used if the property is False. For example, if the property is True, and the incoming name is "John Smith", then the generated contact will be "Mr John Smith", if it is False, then the contact will be "Mr J Smith".

defaultSalutation: This property determines the default salutation, either where the API can't determine one (for example, C Smith or Chris Smith, which could be either Mr or Ms), or where the Prefix supplied doesn't have a salutation rule. If you include the word "Dear" as at the start of the default salutation (i.e. actually specify "Dear Customer" and not just "Customer", then all the salutations derived by the matchIT API will start with the word "Dear" unless the salutation for the type of title (or prefix) specifies "Title" only. For example, Mr J Smith will result in a salutation of "Dear Mr Smith" whereas The Bishop of Liverpool will result in a salutation of "My Lord".

extract/middleNames: When enabled, any middlenames are extracted to the MiddleNames field.

Organization

Location: <settings><advanced><generate><organization>

<organization>
<useEquivalentName>false</useEquivalentName>
<normalizationTruncation>0</normalizationTruncation>
<legalNormalization>false</legalNormalization>
<ignoreParentheses>false</ignoreParentheses>
<ignoreTrailingPostTown>false</ignoreTrailingPostTown>

<--the following are for normalization only:-->
<extract>
<jobTitle>leave</jobTitle>
<name>leave</name>
</extract>
<joinInitials>true</joinInitials>
</organization>

useEquivalentName: If this property is set to True, then the equivalent (according to the word lookup tables) of words indicating a business name, such as "Motors" or "Services" are included in the Record.NormalizedOrganization property and the corresponding phonetic keys. This enables, for example, "Wood Green Cars" to match "Wood Green Motors" well (because "Cars" has an equivalent of "Motors"), but ensures that neither of them match "Wood Green Carpets" well.

As a rule of thumb, if you are doing business matching on a file that is very geographically concentrated, that is, contains records mostly from the same immediate area, then set the Settings.Generate.Organization.UseEquivalentName property to True, otherwise set it to False.

If UseEquivalentName is false, then ordinarily we class all business words as noise words before populating the structure for normalizedOrganzation. They are then treated exactly as noise words from word lookup. If every word is a business word, then we go back to the first word (from left to right) which wasn’t a noise word before this step and change it to type ‘c’. Type ‘c’ so that it’s no longer a noise word (type ‘n’). Then processing continues, meaning the word that was changed to type ‘c’ will not be stripped out.

normalizationTruncation: Disabled by default (i.e. set to 0) If this setting is enabled, and the organization consists of more than three words, then the third element of Record.MatchingFields.NormalizedOrganization will be truncated to the first N characters of each word after the first two (where N is the value of this setting).

legalNormalization: (From Hub 3.1) Disabled by default (i.e. set to false), and ignored if UseEquivalentName is false. If this property is set to True (as well as UseEquivalentName), then the business type words such as "Ltd" and "Inc" (in addition to business words such as "Motors" or "Services") are included in the Record.NormalizedOrganization property and the corresponding phonetic keys.

ignoreParentheses: With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the phonetic organization keys. This can be useful for records such as Remnel Ltd and Remnel (UK) Ltd, to ensure records with these company names are compared if the phonetic organization keys are being used as part of composite match keys.

ignoreTrailingPostTown: This property, when enabled, will exclude from the phonetic organization keys any trailing post town (defined in towns.dat, see Towns Table) or UK county that appears at the end of a company name. For example, the phonetic organization keys for Handso Ltd and Handso Essex Ltd will be the same to help ensure such records will be compared.

Normalization

The following settings are relevant only to the normalization processing mode:

extract/jobTitle: This will copy or extract job titles contained within the ADDRESSEE field into a field labeled JOB_TITLE.

Job Titles are recognized by having a word or string defined as a Job Title in the NAMES.DAT file e.g. Director.

extract/name: This will copy or extract any business names contained within the ADDRESSEE field into a field labeled COMPANY.

Business names are recognized by having a word or string defined as a Business word in the NAMES.DAT file e.g. Ltd. Care should be taken when using this setting, as words like "Bank" can be taken to indicate a Business when this isn't the case (e.g. it may be a last name or part of an address line).

If you want Extract Company Name processing to be applied also to the first one or two lines of the address, you must Set the property Settings.Generate.Address.NumOfLinesToScan to either 1 or 2.

joinInitials: Set this property to True if you want a group of initials separated by spaces or dots in a company name to be concatenated. For example, if this property is True, then "I B M" and "I.B.M." will be replaced by "IBM". Note that, if the Settings.Generate.ProperCase property is set to False, then this property will have no effect.

Address

Location: <settings><advanced><generate><address>

<address>
<verifyPostcode>true</verifyPostcode>
<defaultThoroughfareLine>1</defaultThoroughfareLine>
<numOfLinesToScan>0</numOfLinesToScan>
<premiseFirst>false</premiseFirst>
<detectCountry>basic</detectCountry>

<--the following are for normalization only:-->
<extract>
<premise>copy</premise>
<thoroughfare>copy</thoroughfare>
<town>move</town>
<postTownsOnly>false</postTownsOnly>
<region>move</region>
<postcode>move</postcode>
<country>move</country>
</extract>
<abbreviateRegion>false</abbreviateRegion>
<upperCaseTown>false</upperCaseTown>
</address>

verifyPostcode: If set to True, this property verifies and corrects the format of the postcode. Numerics are changed to alphas and vice versa where appropriate. This feature makes use of the rules concerning the alphanumeric structure of the postcode. e.g. it changes "KT22 BDN" to "KT22 8DN" - it will change 0, 1, 5 and 8 to O, I, S and B, or vice versa, if that makes the postcode alphanumerically correct. The matchIT API will not verify or correct the format of postcodes that are not in the postcode field.

defaultThoroughfareLine: This property is used when the matchIT API is generating a phonetic address key, for which it needs to know the thoroughfare (e.g. street) and the town in the address. If it cannot locate a thoroughfare in the address, usually because it cannot find a word to indicate one, such as "Street", then the API will assume that the thoroughfare is the contents of the address line indicated by this property (if it is greater than zero). For example, if this property is set to 2, then the API will take the contents of address line 2 as the thoroughfare if it cannot find a thoroughfare word in the address. This property should only be used if the addresses in your data are very rigidly structured.

numOfLinesToScan: This property enables personal names to be extracted from address lines. It can be set to 1 or 2. If set to 1, only the first address line will be scanned for names. If set to 2, both the first and second address lines will be scanned and have names extracted from them if found. Any personal names found can then be used for the generation of Contacts and Salutations.

If either or both of the Settings.Generate.Organization.Extract.Jobtitle and Settings.Generate.Organization.ExtractName properties are used in conjunction with this one, the matchIT API will not only scan the ADDRESSEE field for job titles and business names, but will also scan the corresponding number of address lines.

premiseFirst: When parsing an address, this Boolean property indicates whether to expect the premise or flat number to come first in address lines when the flat is not explicitly specified (e.g. "Flat 5"). This should be set to True when the nationality is set to 'USA', or False otherwise.

detectCountry: When parsing an address, this property indicates whether or not and how to check which country the address refers to. Options are:

none - no country detection performed - always assume default country.
basic (default)- just checks the Postcode & Country fields and last line of address, looking for either a full postcode or a full country or region name that fills the entire field.
full - searches all address lines and computes a score based on country specific words found (this is time consuming).

Normalization

The following settings are relevant only to the normalization processing mode:

extract/premise: This will move or copy premise numbers found in the address lines into a field labelled PREMISE.

It is not advisable to Extract the premise if you want to output the updated address later, as the API will not know which address line the premise number came from. It can however be useful to copy premise numbers, as this enables the inclusion of the PREMISE field as part of the match keys used during the compare stage, which can increase efficiency and accuracy when working on large files, or files containing very localized data.

extract/thoroughfare: This will move or copy address data recognized as the thoroughfare of the address (based on Address type entries found in the NAMES.DAT file) into a field labelled THOROUGHFARE.

extract/town: This will move or copy address data recognized as the town or city from the address lines to a field labelled TOWN.

extract/postTownsOnly: If this is enabled, together with Settings.Generate.Address.Extract.Town, then only post towns (i.e. any towns found in the TOWNS.DAT file) will be moved or copied.

extract/region: This will move or copy US, Canadian or Australian states or provinces, or valid UK counties (or other regions found in the NAMES.DAT file), that are found in the address lines into a field labelled REGION.

extract/postcode: This will move or copy UK postcodes, or US zip codes found in the address lines into a field labelled POSTCODE.

Only UK postcodes with an outward half that is valid according to the MAILSORT.DAT file will be extracted. It is advisable to set this property to "MoveExtract" rather than "CopyExtract" if using this data for mailing or updating a database.

extract/country: This will move or copy valid countries found in the address lines (based on Country type entries found in the NAMES.DAT file) into a field labeled 'COUNTRY'.

abbreviateRegion: Set this property to True if you want the matchIT API to abbreviate States or Provinces when processing address lines e.g. to change "Pennsylvania" to "PA".

upperCaseTown: This applies to UK addresses only. Set this property to True to convert the post town in the address to capitals. Note that, if the Settings.Generate.ProperCase property is set to False, then this property is ignored.

Quality

Location: <settings><advanced><generate><quality>

<--the following are for normalization only:-->
<quality>
<enabled>false</enabled>
<address>
<allowBlankPostcode>true</allowBlankPostcode>
</address>
<email>
<webmailFiltering>true</webmailFiltering>
</email>
</quality>

Normalization

Note that the quality settings are relevant only to the normalization processing mode:

enabled: By default, quality scoring is disabled and all quality scores are 0.

address/allowBlankPostcode: If disabled (enabled by default) then addresses without a postal code are restricted to a maximum quality score of 1.

email/webmailFiltering: If enabled (default) then email addresses that use webmail provider domains (such as Hotmail, Yahoo, and mail.com) are restricted to a maximum quality score of 7.

Other Settings

Location: <settings><advanced><generate>

<dropExcludedWords>true</dropExcludedWords>
<considerCasing>true</considerCasing>
<variableKeysMaxLength>8</variableKeysMaxLength>


<properCase>true</properCase>
<specialCaseMac>true</specialCaseMac>

dropExcludedWords: With this property set to True, during the generate step the matchIT API will flag any records that contain exclusion words in any of the key fields (fields such as addressee, company or the address lines). Such exclusion words include "Deceased", "Addressee" (indicating a record may be a header record) and any other Exclusion type entries in the NAMES.DAT file. Records are flagged by setting the first character of the Record.DataFlags property to "X".

considerCasing: If this property is set to True, then the matchIT API will consider the casing of the incoming data when it is splitting the data up for extracting keys, proper casing, and so forth. This is mainly used for the extraction of name data.

variableKeysMaxLength: This specifies the maximum length of various variable-length phonetic keys created when a record is generated. Such keys are PhoneticLastName, PhoneticFirstName, PhoneticMiddleName, PhoneticOrganizationName1, PhoneticOrganizationName2, PhoneticOrganizationName3, PhoneticStreet, and PhoneticTown. The default is 8 characters.

Normalization

The following settings are relevant only to the normalization processing mode:

properCase: If this property is set to True, the Generate step will convert the address lines in your records (labeled ADDRESS1, ADDRESS2... ADDRESSn) to their proper case. This proper casing will handle punctuation, apostrophes and abbreviations. It will also convert ADDRESSEE, JOB_TITLE, DEPARTMENT, and COMPANY to the correct case. Exceptions to the default casing rules are held in the NAMES.DAT file.

The API's default rules for casing data are as follows: letters following an apostrophe are capitalized (e.g. "Mr O'Reilly"), as are letters following "Mc" or (subject to one of the Advanced Input Options) "Mac" at the start of a name (see "Mac Name Treatment"). Double-barreled names have a capital letter after the hyphen. If the name or other word has a proper casing entry in the NAMES.DAT file, it is cased as shown there e.g. BSc, helpIT, IBM, plc. If not in the NAMES.DAT file, words are all capitals if they contain no vowels, otherwise they are changed to initial capital followed by lower case letters.

specialCaseMac: Where a last name begins with Mac, when formatting salutations, the matchIT API follows this with a small letter or a capital letter, depending on this property. A value of True will mean that MACLEAN will be formatted as MacLean. You can add exceptions to the rule (e.g. Maccabee, Macclesfield, MacKay, Mackie) to the NAMES.DAT file. If you invariably want to use a lower case letter following Mac, set this property to False.

NB: Names beginning Mach are always formatted with a lower case H, e.g. Machin, Machinery. Names beginning Mc are formatted with a capital letter following, if they are greater than 3 characters long.

matchIT Hub Index

Name

Normalization

Organization

Normalization

Address

Normalization

Quality

Normalization

Other Settings

Normalization

Related articles