EngineSettings Property Usage
Nationality Of Data
This property primarily influences processing of the POSTCODE field:
- it works in conjunction with the Extract Postcode property to determine whether a string in the address is determined to be a postal code and should be moved to the Postcode field.
- it works in conjunction with the VerifyPostcode property to determine whether a string in the address or Postcode field has an alphanumeric error e.g. 5 when it should be S.
- it works in conjunction with the Postcode.Weight matching property (e.g. IndividualLevel Postcode Weight) for matching: different countries have different standards for how specific a postcode is e.g. one per town, one per street, several per street; hence two full UK postcodes that match will get a higher score than two 5 digit US zip codes that match.
Engine Settings: Generate…
The following settings are used when the matchIT API generates a record (i.e. for key generation, and data standardisation and casing):
Generate.Address.AbbreviateRegion
Set this property to True if you want the matchIT API to abbreviate States or Provinces when processing address lines e.g. to change “Pennsylvania” to “PA”.
Generate.Address.DefaultThoroughfareLine
This property is used when the matchIT API is generating a phonetic address key, for which it needs to know the thoroughfare (e.g. street) and the town in the address. If it cannot locate a thoroughfare in the address, usually because it cannot find a word to indicate one, such as “Street”, then the API will assume that the thoroughfare is the contents of the address line indicated by this property (if it is greater than zero). For example, if this property is set to 2, then the API will take the contents of address line 2 as the thoroughfare if it cannot find a thoroughfare word in the address. This property should only be used if the addresses in your data are very rigidly structured.
Generate.Address.Extract
This group of properties applies to Country, Postcode, Premise, Region, Thoroughfare and Town. It enables different types of data to be moved or copied from specific fields (in most cases the address lines) to designated fields that were explicitly added to store that type of data. These properties can be used to greatly improve the structure and consistency of the data, which can inherently improve results obtained during the Compare stage.
Each of these properties can either be set to “MoveExtract”, “CopyExtract” or “Leave”. If the property is set to “MoveExtract”, the corresponding data with will be moved from its original field into the designated field. When set to “CopyExtract”, the data will be copied into the designated field, and will still remain in the original field. When set to “Leave”, the corresponding data will not be copied or moved.
Each of these properties will be ignored if the relevant Record property required to copy or move the data into does not exist.
If the Settings.Generate.ProperCase property is set to True, moved and copied data (with the exception of premise (i.e. building) numbers and postcodes) will be correctly cased.
The different properties in this group are listed below with a description of data that each corresponds to, and any other information specific to that property…
Generate.Address.Extract.Country
This will move or copy valid countries found in the address lines (based on Country type entries found in the NAMES.DAT file) into a field labeled ‘COUNTRY’.
Generate.Address.Extract.Postcode
This will move or copy UK postcodes, or US zip codes found in the address lines into a field labelled POSTCODE.
Only UK postcodes with an outward half that is valid according to the MAILSORT.DAT file will be extracted. It is advisable to set this property to “MoveExtract” rather than “CopyExtract” if using this data for mailing or updating a database.
Generate.Address.Extract.Premise
This will move or copy premise numbers found in the address lines into a field labelled PREMISE.
It is not advisable to Extract the premise if you want to output the updated address later, as the API will not know which address line the premise number came from. It can however be useful to copy premise numbers, as this enables the inclusion of the PREMISE field as part of the match keys used during the compare stage, which can increase efficiency and accuracy when working on large files, or files containing very localised data.
Generate.Address.Extract.Region
This will move or copy US, Canadian or Australian states or provinces, or valid UK counties (or other regions found in the NAMES.DAT file), that are found in the address lines into a field labelled REGION.
Generate.Address.Extract.Thoroughfare
This will move or copy address data recognised as the thoroughfare of the address (based on Address type entries found in the NAMES.DAT file) into a field labelled THOROUGHFARE.
Generate.Address.Extract.Town
This will move or copy address data recognised as the town or city from the address lines to a field labelled TOWN.
Generate.Address.Extract.PostTownsOnly
If this is enabled, together with Settings.Generate.Address.Extract.Town, then only post towns (i.e. any towns found in the TOWNS.DAT file) will be moved or copied.
Generate.Address.NumOfLinesToScan
This property enables personal names to be extracted from address lines. It can be set to 1 or 2. If set to 1, only the first address line will be scanned for names. If set to 2, both the first and second address lines will be scanned and have names extracted from them if found. Any personal names found can then be used for the generation of Contacts and Salutations.
If either or both of the Settings.Generate.Organization.Extract.Jobtitle and Settings.Generate.Organization.ExtractName properties are used in conjunction with this one, the matchIT API will not only scan the ADDRESSEE field for job titles and business names, but will also scan the corresponding number of address lines
Generate.Address.PremiseFirst
When parsing an address, this Boolean property indicates whether to expect the premise or flat number to come first in address lines when the flat is not explicitly specified (e.g. “Flat 5”).
Generate.Address.UpperCaseTown
This applies to UK addresses only. Set this property to True to convert the post town in the address to capitals. Note that, if the Settings.Generate.ProperCase property is set to False, then this property is ignored.
Generate.Address.VerifyPostcode
If set to True, this property verifies and corrects the format of the postcode. Numerics are changed to alphas and vice versa where appropriate. This feature makes use of the rules concerning the alphanumeric structure of the postcode. e.g. it changes “KT22 BDN” to “KT22 8DN” – it will change 0, 1, 5 and 8 to O, I, S and B, or vice versa, if that makes the postcode alphanumerically correct. The matchIT API will not verify or correct the format of postcodes that are not in the postcode field.
Generate.ConsiderCasing
If this property is set to True, then the matchIT API will consider the casing of the incoming data when it is splitting the data up for extracting keys, proper casing, and so forth. This is mainly used for the extraction of name data. For instance, consider the name:
Generate.DropExcludedWords
With this property set to True, during the generate step the matchIT API will flag any records that contain exclusion words in any of the key fields (fields such as addressee, company or the address lines). Such exclusion words include “Deceased”, “Addressee” (indicating a record may be a header record) and any other Exclusion type entries in the NAMES.DAT file. Records are flagged by setting the first character of the Record.DataFlags property to “X”.
Generate.Name.ContactFullName
Set this property to True to include the full first name of any incoming name in the CONTACT field; just the initial will be used if the property is False. For example, if the property is True, and the incoming name is “John Smith”, then the generated contact will be “Mr John Smith”, if it is False, then the contact will be “Mr J Smith”.
Generate.Name.DefaultGender
The Default Gender property is the gender to assume when the matchIT API can’t determine whether the name is male or female e.g. Chris Smith, C Smith. If you set this property to Male or Female, the API will assume it to be male or female accordingly, and develop a salutation using Mr or Ms as the prefix.
Generate.Name.DefaultSalutation
This property determines the default salutation, either where the API can't determine one (for example, C Smith or Chris Smith, which could be either Mr or Ms), or where the Prefix supplied doesn’t have a salutation rule. If you include the word ”Dear” as at the start of the default salutation (i.e. actually specify "Dear Customer" and not just "Customer", then all the salutations derived by the matchIT API will start with the word "Dear" unless the salutation for the type of title (or prefix) specifies "Title" only. For example, Mr J Smith will result in a salutation of "Dear Mr Smith" whereas The Bishop of Liverpool will result in a salutation of "My Lord".
Generate.Name.DetectInverseNames
With this property enabled, the matchIT API will attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, “Smith, John” where Smith is the lastname). Without a comma, a name is assumed to be in standard left-to-right format, with the firstnames preceding the lastname.
This setting is disabled by default.
Generate.Name.EnhancedDoubleBarrelledLookup
When enabled, this property will cause an unrecognised middle name to be considered part of a non-hyphenated double-barrelled last name (for example, where the full name is John Harrington Jones, the last name will be considered Harrington -Jones because Harrington is not a recognised first name).
Generate.Name.GenerateContact
With this property set to True, the matchIT API will generate a contact for the input name. The contact will be structured in same way as you would expect to find its corresponding input name on e.g. the front of an envelope. For example, the input name of “John Smith” or “Mr Smith” would result in a generated contact of “Mr J Smith”.
An accurate contact value cannot be generated when the matchIT API is unable to determine the gender of a input name. In this situation, the generated contact would be equal to the input name. e.g. “J Smith” as an input name would result in a generated contact of “J Smith”.
Generate.Name.JoinMarriedPrefixes
With this property set to True, multiple addressees with the same last name will be treated as married e.g. input names of “Mr. John Smith and Ms. Mary Smith” or “Mr John Smith & Mary Smith” would have a Salutation generated of “Mr and Mrs Smith” and a Contact generated of “Mr and Mrs J Smith” or “Mr and Mrs John Smith”.
Generate.Name.ParseAsNormalizedName
When enabled, addressee names are assumed to be in a delimited normalised format similar to the Record.MatchingFields.NormalizedName value that’s output by Engine.Generate( ). Currently supported delimiters are spaces, commas, semicolons, and pipes (‘|’).
This property is disabled by default.
Generate.Name.GenerateContact
With this property set to True, the matchIT API will generate a contact for the input name. The contact will be structured in same way as you would expect to find its corresponding input name on e.g. the front of an envelope. For example, the input name of “John Smith” or “Mr Smith” would result in a generated contact of “Mr J Smith”.
An accurate contact value cannot be generated when the matchIT API is unable to determine the gender of a input name. In this situation, the generated contact would be equal to the input name. e.g. “J Smith” as an input name would result in a generated contact of “J Smith”.
Generate.Name.JoinMarriedPrefixes
With this property set to True, multiple addressees with the same last name will be treated as married e.g. input names of “Mr. John Smith and Ms. Mary Smith” or “Mr John Smith & Mary Smith” would have a Salutation generated of “Mr and Mrs Smith” and a Contact generated of “Mr and Mrs J Smith” or “Mr and Mrs John Smith”.
Generate.Name.ParseAsNormalizedName
When enabled, addressee names are assumed to be in a delimited normalised format similar to the Record.MatchingFields.NormalizedName value that’s output by Engine.Generate( ). Currently supported delimiters are spaces, commas, semicolons, and pipes (‘|’).
This property is disabled by default.
Generate.Name.ParseNameElements
When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed. If the matchIT API deems any values to have been entered into an incorrect field (for example, suffixes and qualifications in the lastname field), it will reassign these values into the correct fields.
This property is disabled by default, so that any such incorrect values are not reassigned.
Generate.Name.ProcessBlankLastName
With this property enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors. For example, if a firstname was entered but not a lastname, then it’ll be assumed that the firstname is in fact the lastname and match keys will be generated rather than being left blank.
This property is disabled by default.
Generate.Name.ReplaceAndWithAmpersand
On return from Engine.Generate( ) the matchIT API will, by default, convert ‘and’ to an ampersand when outputting InputFields.Name.Addressee. Disabling this property will prevent this behaviour.
Generate.Name.UseEquivalentNames
If you set the Use Equivalent Name property to True, in the generated the matchIT API replaces the first name with its equivalent from the NAMES.DAT file, if there is an entry for the input first name. This enables, for example, “Tony Smith” and “Anthony Smith” to be picked up as a match. The initial of the original first name is stored in the Record.DataFlags property to enable, for example, “Tony Smith” and “T Smith” to still be matched.
Generate.NormalizationDelimiter
This setting (default is a comma, ‘,’) contains the delimiter that’s used when generating the Record.MatchingFields.NormalizedName and Record.MatchingFields.NormalizedOrganization values. This could be useful when outputting values to a comma-delimited file, for example, so that an alternative character (such as a pipe ‘|’) could instead be used.
Generate.Organization.Extract.Jobtitle
This will copy or extract job titles contained within the ADDRESSEE field into a field labeled JOB_TITLE.
Job Titles are recognised by having a word or string defined as a Job Title in the NAMES.DAT file e.g. Director.
Generate.Organization.ExtractName
This will copy or extract any business names contained within the ADDRESSEE field into a field labeled COMPANY.
Business names are recognised by having a word or string defined as a Business word in the NAMES.DAT file e.g. Ltd. Care should be taken when using this property, as words like "Bank" can be taken to indicate a Business when this isn't the case (e.g. it may be a last name or part of an address line).
If you want Extract Company Name processing to be applied also to the first one or two lines of the address, you must Set the property Settings.Generate.Address.NumOfLinesToScan to either 1 or 2.
Generate.Organization.IgnoreParentheses
With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the phonetic organization keys. This can be useful for records such as Remnel Ltd and Remnel (UK) Ltd, to ensure records with these company names are compared if the phonetic organization keys are being used as part of composite match keys.
Generate.Organization.IgnoreTrailingPostTown
This property, when enabled, will exclude from the phonetic organization keys any trailing post town (defined in towns.dat, see Towns Table) or UK county that appears at the end of a company name. For example, the phonetic organization keys for Handso Ltd and Handso Essex Ltd will be the same to help ensure such records will be compared.
Generate.Organization.JoinInitials
Set this property to True if you want a group of initials separated by spaces or dots in a company name to be concatenated. For example, if this property is True, then “I B M” and “I.B.M.” will be replaced by “IBM”. Note that, if the Settings.Generate.ProperCase property is set to False, then this property will have no effect.
Generate.Organization.NormalizationTruncation
Disabled by default (i.e. set to 0) If this setting is enabled, and the organization consists of more than four words, then the third element of Record.MatchingFields.NormalizedOrganization will be truncated to the first N characters of each word after the first two (where N is the value of this setting).
Generate.Organization.UseEquivalentName
If this property is set to True, then the equivalent (according to the NAMES.DAT file) of words indicating a business name, such as “Motors” or “Services” are included in the Record.NormalizedOrganization property and the corresponding phonetic keys. This enables, for example, “Wood Green Cars” to match “Wood Green Motors” well (because “Cars” has an equivalent of “Motors”), but ensures that neither of them match “Wood Green Carpets” well.
If you set this property to True, you should change any words in the NAMES.DAT file that you do want ignored, such as “Ltd” and “Inc” to Noise type so that they are not included in the Record.NormalizedOrganization property. As a rule of thumb, if you are doing business matching on a file that is very geographically concentrated, that is, contains records mostly from the same immediate area, then set the Settings.Generate.Organization.UseEquivalentName property to True, otherwise set it to False
Generate.ProperCase
If this property is set to True, the Generate step will convert the address lines in your records (labeled ADDRESS1, ADDRESS2... ADDRESSn) to their proper case. This proper casing will handle punctuation, apostrophes and abbreviations. It will also convert ADDRESSEE, JOB_TITLE, DEPARTMENT, and COMPANY to the correct case. Exceptions to the default casing rules are held in the NAMES.DAT file.
The API's default rules for casing data are as follows: letters following an apostrophe are capitalised (e.g. “Mr O’Reilly”), as are letters following “Mc” or (subject to one of the Advanced Input Options) “Mac” at the start of a name (see "Mac Name Treatment"). Double-barreled names have a capital letter after the hyphen. If the name or other word has a proper casing entry in the NAMES.DAT file, it is cased as shown there e.g. BSc, 360Science, IBM, plc. If not in the NAMES.DAT file, words are all capitals if they contain no vowels, otherwise they are changed to initial capital followed by lower case letters.
Generate.Quality.Enabled
By default, quality scoring is disabled and all quality scores are 0.
Generate.Quality.Address.AllowBlankPostcode
If disabled (enabled by default) then addresses without a postal code are restricted to a maximum quality score of 1.
Generate.Quality.Address.Country
With this enabled, address quality scores will receive an extra one point if the address contains a recognised country. This is disabled by default.
Generate.Quality.Address.MaxRepetition
This setting (the default is 0.7) is used when calculating the repetition level of characters within a string (in this case, all the concatenated address lines and elements). For example, the string “Heheheb” contains seven characters, six of which are involved in repeated sequences (he: hehehe); the repetition level is thus calculated as 6/7=0.857, which exceeds the max repetition level and will therefore be flagged as nonsense and achieve a quality score of 0.
Generate.Quality.Address.Premise
With this enabled, address quality scores will receive an extra one point if the address contains a premise number. This is disabled by default.
Generate.Quality.Address.Region
With this enabled (the default), address quality scores will receive an extra one point if the address contains a recognised region (e.g. county or state).
Generate.Quality.Email.MaxRepetition
(See Settings.Generate.Quality.Address.MaxRepetition, above.)
Generate.Quality.Email.WebmailFiltering
If enabled (default) then email addresses that use webmail provider (such as Hotmail, Yahoo, & mail.com) domains are restricted to a maximum quality score of 7.
Generate.Quality.Name.MaxRepetition
(See Settings.Generate.Quality.Address.MaxRepetition, above.)
Generate.ReportUnrecognisedWords
This can specify a callback function that is used to notify the calling application of any unrecognised words (i.e. not found within names.dat) that are encountered when parsing records. The callback function is implemented as an object of a user-defined class that derives from matchIT.IReportUnrecognisedWords.
Generate.SpecialCaseMac
Where a last name begins with Mac, when formatting salutations, the matchIT API follows this with a small letter or a capital letter, depending on this property. A value of True will mean that MACLEAN will be formatted as MacLean. You can add exceptions to the rule (e.g. Maccabee, Macclesfield, MacKay, Mackie) to the NAMES.DAT file. If you invariably want to use a lower case letter following Mac, set this property to False.
NB: Names beginning Mach are always formatted with a lower case H, e.g. Machin, Machinery. Names beginning Mc are formatted with a capital letter following, if they are greater than 3 characters long.
Generate.VariableKeysMaxLength
This specifies the maximum length of various variable-length phonetic keys created when a record is generated. Such keys are PhoneticLastName, PhoneticFirstName, PhoneticMiddleName, PhoneticOrganizationName1, PhoneticOrganizationName2, PhoneticOrganizationName3, PhoneticStreet, and PhoneticTown. The default is eight characters.