Input Parameters:
- Configuration file – the file path of the configuration file to be used when this procedure is run.
- Datasource ID – specifies the data source to be used within the configuration file, which contains the table and column mapping specifications.
As msp_GenerateKeys, except that the matching and key field data is bulk loaded into an empty keys table. Significantly reduces the time taken for key generation, particularly when used with large databases.
The following settings are specific to the match key generation. These settings can be configured through the web UI or directly in the XML configuration file used during execution of the procedure:
Setting |
Description |
dataSources |
Specifies the database connection, table and column mappings used to define the dataset being processed. Only one table can be marked as the keys table using the isKeysTable option. The keys table that will be populated when this procedure runs will have the name specified in the name attribute of the table marked as the keys table. The recommendation is to not actually specify a keys table, so that the name of the keys table will be automatically generated (see the suffixes in the outputSettings section of a standard mSQL configuration XML). If you are using msp_BulkGenerateKeys (recommended) then one table may be added in the data source section to generate a cleaned, standardized version of your input data. To configure this table, add a table with the isOutputTable property set to true. This table can be used to proper case your data, parse names and simple address validation – please see the mAPI setting options described here. |
generalSettings->ensureUniqueRefIsClustered |
If enabled, then the uniqueRef column specified in the first table mapped in a data source is checked to ensure that it is referenced by a unique clustered index. (Note that this can be, but is not required to be, a primary key column.) |
outputSettings->reports |
Specifies whether reporting is enabled, what folder the reports will be produced into, and what report format should be used. |
matchITAPISettings->generate->dropExcludedWords |
With this property set to True, during the generate step mSQL will flag any records that contain exclusion words in any of the key fields (fields such as addressee, company or the address lines). Such exclusion words include “Deceased”, “Addressee” (indicating a record may be a header record) and any other Exclusion type entries in the NAMES.DAT file. Records are flagged by setting the first character of the mkDataFlags field in the generated keys table to “X”. Note that the default location of the names.dat file is: C:\Program Files\Common Files\matchIT API\dat |
matchITAPISettings->generate->properCase |
If the ProperCase property is set to True, the fields within the configured output table created during key generation (with the exception of premise (i.e. building) numbers and postcodes) will be correctly cased. |
matchITAPISettings->generate->considerCasing |
If this property is set to True, then mSQL will consider the casing of the incoming data when it is splitting the data up for extracting keys, proper casing, and so forth. For Example, with consider casing switched on the company: ABCD Systems Ltd ABCD would be considered to be an acronym rather than simply a word (i.e. on output without this option, ABCD would be output as Abcd). |
matchITAPISettings->generate->specialCaseMac |
Where a last name begins with Mac, when formatting salutations, mSQL follows this with a small letter or a capital letter, depending on this property. A value of True will mean that MACLEAN will be formatted as MacLean. You can add exceptions to the rule (e.g. Maccabee, Macclesfield, MacKay, Mackie) to the NAMES.DAT file. If you invariably want to use a lower case letter following Mac, set this property to False. NB: Names beginning Mach are always formatted with a lower case H, e.g. Machin, Machinery. Names beginning Mc are formatted with a capital letter following, if they are greater than 3 characters long. |
matchITAPISettings->generate->variableKeysMaxLength |
This specifies the maximum length of various variable-length phonetic keys created. Such keys are PhoneticLastName, PhoneticFirstName, PhoneticMiddleName, PhoneticOrganizationName1, PhoneticOrganizationName2, PhoneticOrganizationName3, PhoneticStreet, and PhoneticTown. The default is eight characters. |
matchITAPISettings->generate->quality->enabled |
By default, quality scoring is disabled and all quality scores are 0. Enabling this feature allows mSQL to generate quality scores for data fields such as: Names Emails Addresses Company Names These results are written to the output table configured in the Datasource being used during the GenerateKeys process. |
matchITAPISettings->generate->quality->address->allowBlankPostcode |
If disabled (enabled by default) then addresses without a postal code are restricted to a maximum quality score of 1. |
matchITAPISettings->generate->quality->email-> webmailFiltering |
If enabled (default) then email addresses that use webmail provider (such as Hotmail, Yahoo, & mail.com) domains are restricted to a maximum quality score of 7. |
LOW LEVEL ADVANCED SETTINGS |
The following settings are low level and most users will not normally need to modify the default settings. |
matchITAPISettings->generate->Name->joinMarriedPrefixes
|
With this property set to True, multiple addressees with the same last name will be treated as married e.g. input names of “Mr. John Smith and Ms. Mary Smith” or “Mr John Smith & Mary Smith” would have a Salutation generated of “Mr and Mrs Smith” and a Contact generated of “Mr and Mrs J Smith” or “Mr and Mrs John Smith”. |
matchITAPISettings->generate->Name->generateContact
|
With this property set to True, mSQL will generate a contact for the input name. The contact will be structured in same way as you would expect to find its corresponding input name on e.g. the front of an envelope. For example, the input name of “John Smith” or “Mr John Smith” would result in a generated contact of “Mr J Smith”. An accurate contact value cannot be generated when mSQL is unable to determine the gender of an input name. In this situation, the generated contact would be equal to the input name. e.g. “J Smith” as an input name would result in a generated contact of “J Smith”. |
matchITAPISettings->generate->Name->contactFullname |
Set this property to True to include the full first name of any incoming name in the CONTACT field; just the initial will be used if the property is False. For example, if the property is True, and the incoming name is “John Smith”, then the generated contact will be “Mr John Smith”, if it is False, then the contact will be “Mr J Smith”.
|
matchITAPISettings->generate->Name->defaultSalutation |
This property determines the default salutation, either where mSQL can't determine one (for example, C Smith or Chris Smith, which could be either Mr or Ms), or where the Prefix supplied doesn’t have a salutation rule. If you include the word ”Dear” as at the start of the default salutation (i.e. actually specify "Dear Customer" and not just "Customer", then all the salutations derived by mSQL will start with the word "Dear" unless the salutation for the type of title (or prefix) specifies "Title" only. For example, Mr J Smith will result in a salutation of "Dear Mr Smith" whereas The Bishop of Liverpool will result in a salutation of "My Lord".
|
matchITAPISettings->generate->Name->defaultGender |
The Default Gender property is the gender to assume when mSQL can’t determine whether the name is male or female e.g. Chris Smith, C Smith. If you set this property to Male or Female, mSQL will assume it to be male or female accordingly, and develop a salutation using Mr or Ms as the prefix. |
matchITAPISettings->generate->Name->UseEquivalentName |
If you set the Use Equivalent Name property to True, mSQL replaces the first name with its equivalent from the NAMES.DAT file, if there is an entry for the input first name. This enables, for example, “Tony Smith” and “Anthony Smith” to be picked up as a match. The initial of the original first name is stored in the Record.DataFlags property to enable, for example, “Tony Smith” and “T Smith” to still be matched. |
matchITAPISettings->generate->Name->EnhancedDoubleBarrelledLookup |
When enabled, this property will cause an unrecognized middle name to be considered part of a non-hyphenated double-barreled last name (for example, where the full name is John Harrington Jones, the last name will be considered Harrington-Jones because Harrington is not a recognized first name). |
matchITAPISettings->generate->Name->processBlankLastName |
With this property enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors. For example, if a firstname was entered but not a lastname, then it’ll be assumed that the firstname is in fact the lastname and match keys will be generated rather than being left blank. |
matchITAPISettings->generate->Name->replaceAndWithAmpersand |
By default, mSQL will convert ‘and’ to an ampersand when outputting InputFields.Name.Addressee. Disabling this property will prevent this behavior. |
matchITAPISettings->generate->Name->parseNameElements |
When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed. If mSQL deems any values to have been entered into an incorrect field (for example, suffixes and qualifications in the lastname field), it will reassign these values into the correct fields. This property is disabled by default, so that any such incorrect values are not reassigned. |
matchITAPISettings->generate->Name->detectInverseNames |
With this property enabled, mSQL will attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, “Smith, John” where Smith is the lastname). Without a comma, a name is assumed to be in standard left-to-right format, with the firstnames preceding the lastname. |
matchITAPISettings->generate->Name->parseAsNormalizedName |
When enabled, addressee names are assumed to be in a delimited normalized format similar to the NormalizedName value that’s output by during Key Generation. Currently supported delimiters are spaces, commas, semicolons, and pipes (‘|’). |
matchITAPISettings->generate->Address->Extract->premise |
This will move or copy premise numbers found in the address lines into a field labeled PREMISE in the table configured as an output table during key generation. |
matchITAPISettings->generate->Address->Extract->thoroughfare |
This will move or copy address data recognized as the thoroughfare of the address (based on Address type entries found in the NAMES.DAT file) into a field labeled THOROUGHFARE in the table configured as an output table during key generation. |
matchITAPISettings->generate->Address->Extract->town |
This will move or copy address data recognized as the town or city from the address lines to a field labeled TOWN in the table configured as an output table during key generation. |
matchITAPISettings->generate->Address->Extract->postTownsOnly |
If this is enabled, together with Extract->Town, then only post towns (i.e. any towns found in the TOWNS.DAT file) will be moved or copied. |
matchITAPISettings->generate->Address->Extract->region |
This will move or copy US, Canadian or Australian states or provinces, or valid UK counties (or other regions found in the NAMES.DAT file), that are found in the address lines into a field labeled REGION in the table configured as an output table during key generation. |
matchITAPISettings->generate->Address->Extract->postcode |
This will move or copy UK postcodes, or US zip codes found in the address lines into a field labeled POSTCODE. Only UK postcodes with an outward half that is valid according to the MAILSORT.DAT file will be extracted. |
matchITAPISettings->generate->Address->Extract->country |
This will move or copy valid countries found in the address lines (based on Country type entries found in the NAMES.DAT file) into a field labeled ‘COUNTRY’ in the table configured as an output table during key generation. |
matchITAPISettings->generate->Address->abbreviateRegion |
Set this property to True if you want mSQL to abbreviate States or Provinces when processing address lines e.g. to change “Pennsylvania” to “PA” within the table configured as an output table during key generation. |
matchITAPISettings->generate->Address->upperCaseTown |
This applies to UK addresses only. Set this property to True to convert the post town in the address to capitals within the table configured as an output table during key generation. Note that, if the ProperCase property is set to False, then this property is ignored. |
matchITAPISettings->generate->Address->verifyPostcode |
If set to True, this property verifies and corrects the format of the postcode. Numerics are changed to alphas and vice versa where appropriate. This feature makes use of the rules concerning the alphanumeric structure of the postcode. E.g. it changes “KT22 BDN” to “KT22 8DN” – it will change 0, 1, 5 and 8 to O, I, S and B, or vice versa, if that makes the postcode alphanumerically correct. mSQL will not verify or correct the format of postcodes that are not in the postcode field. The cleaned postcodes are output to the table configured as an output table within the datasource. |
matchITAPISettings->generate->Address->defaultThoroughfareLine |
This property is used when the mSQL is generating a phonetic address key, for which it needs to know the thoroughfare (e.g. street) and the town in the address. If it cannot locate a thoroughfare in the address, usually because it cannot find a word to indicate one, such as “Street”, then it will be assumed that the thoroughfare is the contents of the address line indicated by this property (if it is greater than zero). For example, if this property is set to 2, then mSQL will take the contents of address line 2 as the thoroughfare if it cannot find a thoroughfare word in the address. This property should only be used if the addresses in your data are very rigidly structured. |
matchITAPISettings->generate->Address->numOfLinesToScan |
This property enables personal names to be extracted from address lines. It can be set to 1 or 2. If set to 1, only the first address line will be scanned for names. If set to 2, both the first and second address lines will be scanned and have names extracted from them if found. Any personal names found can then be used for the generation of Contacts and Salutations. If either or both of the Organization->Extract->Jobtitle and Organization->Extract->Name properties are used in conjunction with this one, mSQL will not only scan the ADDRESSEE field for job titles and business names, but will also scan the corresponding number of address lines.
|
matchITAPISettings->generate->Address->premiseFirst |
When parsing an address, this Boolean property indicates whether to expect the premise or flat number to come first in address lines when the flat is not explicitly specified (e.g. “Flat 5”). |
matchITAPISettings->generate->Organization->Extract->jobTitle |
This will copy or extract job titles contained within the ADDRESSEE field into a field labeled JOB_TITLE within your output table. Job Titles are recognized by having a word or string defined as a Job Title in the NAMES.DAT file e.g. Director. |
matchITAPISettings->generate->Organization->Extract->name |
This will copy or extract any business names contained within the ADDRESSEE field into a field labeled COMPANY within your output table. Business names are recognized by having a word or string defined as a Business word in the NAMES.DAT file e.g. Ltd. Care should be taken when using this property, as words like "Bank" can be taken to indicate a Business when this isn't the case (e.g. it may be a last name or part of an address line). If you want Extract Company Name processing to be applied also to the first one or two lines of the address, you must Set the property Generate->Address->NumOfLinesToScan to either 1 or 2. |
matchITAPISettings->generate->Organization->joinInitials |
Set this property to True if you want a group of initials separated by spaces or dots in a company name to be concatenated. For example, if this property is True, then “I B M” and “I.B.M.” will be replaced by “IBM” within the company field of your output table. Note that, if the Generate->ProperCase property is set to False, then this property will have no effect.
|
matchITAPISettings->generate->Organization->useEquivalentName |
If this property is set to True, then the equivalent (according to the NAMES.DAT file) of words indicating a business name, such as “Motors” or “Services” are included in the NormalizedOrganization field in the generated keys table and the corresponding phonetic keys. This enables, for example, “Wood Green Cars” to match “Wood Green Motors” well (because “Cars” has an equivalent of “Motors”), but ensures that neither of them match “Wood Green Carpets” well. If you want tight legal matching turned on so that for example, ‘Wood Green Cars Limited’ will match ‘Wood Green Cars Ltd’ but ‘Wood Green Cars Group’ will not match ‘Wood Green Cars Ltd’, then in addition to setting this option to True, you will also need to modify the ‘matchITAPISettings>datPath’ property to the location of the tight dat files which can be found in a subfolder called ‘Tight’ under each region folder in C:\mSQL\config\dataFiles\... . If you set this property to True, you should change any words in the NAMES.DAT file that you do want ignored, such as “Ltd” and “Inc” to Noise type so that they are not included in the NormalizedOrganization field. As a rule of thumb, if you are doing business matching on a file that is very geographically concentrated, that is, contains records mostly from the same immediate area, then set the Generate->Organization->UseEquivalentName property to True, otherwise set it to False. |
matchITAPISettings->generate->Organization->normalizationTruncation |
Disabled by default (i.e. set to 0) If this setting is enabled, and the organization consists of more than four words, then the third element of field NormalizedOrganization within your generated keys table will be truncated to the first N characters of each word after the first two (where N is the value of this setting). |
matchITAPISettings->generate->Organization->ignoreParentheses |
With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the generated phonetic organization keys. This can be useful for records such as Remnel Ltd and Remnel (UK) Ltd, to ensure records with these company names are compared if the phonetic organization keys are being used as part of composite match keys. |
matchITAPISettings->generate->Organization->ignoreTrailingPostTown |
This property, when enabled, will exclude from the phonetic organization keys any trailing post town (defined in the towns.dat file) or UK county that appears at the end of a company name. For example, the phonetic organization keys for Handso Ltd and Handso Essex Ltd will be the same to help ensure such records will be compared. |
Match Key Fields
The match key table can contain the following fields:
Column Name |
Description |
(unique ref) |
Unique reference for each record – either directly specified for the keys table in the table mappings, or taken from the first table in the mappings. |
mkNameKey |
Phonetic representation of the name. Optional. |
mkOrganizationKey |
Phonetic representation of the company name. Optional. |
mkAddressKey |
Phonetic representation of the address lines. Optional. |
mkPhoneticStreet |
Phonetic representation of the thoroughfare. Optional. |
mkPhoneticTown |
Phonetic representation of the town/city. Optional. |
mkPostOut |
First part of the postal code/zip. Optional. |
mkPostIn |
Second part of the postal code/zip. Optional. |
mkName1 |
Phonetic representation of the lastname. Optional. |
mkName2 |
Phonetic representation of the firstname. Optional. |
mkName3 |
Phonetic representation of the middle name or initial. Optional. |
mkOrgName1 |
Phonetic representation of the first word of the company name. Optional. |
mkOrgName2 |
Phonetic representation of the second word of the company name. Optional. |
mkOrgName3 |
Phonetic representation of the third word of the company name. Optional. |
mkTelAreaCode |
Telephone area code. Optional. |
mkTelLocalNumber |
Telephone local number. Optional. |
mkFaxAreaCode |
Fax area code. Optional. |
mkFaxLocalNumber |
Fax local number. Optional. |
mkName2Found |
Indicates whether the firstname was found in the names.dat file. Optional. |
mkNormalizedName |
Normalized version of all of the consumer name data. |
mkGender |
Generated gender based on the name data provided |
mkSuffix |
Any suffix data extracted from the name fields in the source data. |
mkNormalizedOrganization |
Normalized version of the Organization name. |
mkPremise |
Premise data extracted from the address lines. Optional. |
mkFlatNo |
Sub premise extracted from the address lines. Optional. |
mkDataFlags |
Flag field generated during key generation. Please see more information about the Dataflags here. |
mkMasterPriority |
Master priority calculated for the record based on the completeness of the data as defined by mSQL’s Master Priority Matrix. Used during the grouping of matches to help determine the master record. |
mkAddressLength |
The calculated length of the address data contained in the source data. Used during the grouping of matches to help determine the master record. |
The optional columns can be configured in the outputSettings->keyColumns node in a configuration file; most are enabled by default, but key columns not required by any match keys can be disabled to help improve performance of key generation and deduplication.
Columns not marked as optional will always be added to the keys table and cannot be disabled.