The advanced settings dialog is a popup tabbed dialog.
Matching tab
Minimum scores |
Specifies a matching threshold score for each matching level enabled. |
Maximum cluster size |
All processed data is added to clusters. When a record is added to an existing cluster, it's then compared to each record already in the cluster, provided the maximum cluster size has not been exceeded. If the cluster has reached the maximum size, then no more comparisons will be performed on that cluster and it will be logged as a large cluster. |
Output Options
Output unique refs only |
If enabled, then only unique refs are output. If disabled (default), then the output contains a copy of the input data, which can include the unique ref. |
Output component scores |
If enabled, then scores for mapped components are output for each matching level in addition to total scores. If disabled (the default), then only the total score for each matching level is output. |
Output exact match scores |
If enabled, then a total score is output for exact matches that is the sum of the sure score setting for all mapped components plus one. Otherwise the score field is blank for exact matches. Regardless of this setting the component scores for exact matches are always blank. |
Output all exact matches |
When disabled (the default), matching pairs are only output if a record exactly matches the first record of a cluster. If enabled, then all matching pairs are output. |
Output highest scores |
If enabled, highest scores are also output to the Grouped Matching Pairs and Matching Groups output types. This is the highest score achieved by any matching pair within each group. (Conversely, the base score is the lowest score achieved by the pairs within a group and is always output.) |
Output duplicates count |
If enabled, the number of duplicates in each group is output to the Matching Groups output type. This is one less than the number of records in the group. |
Output compare results |
If enabled, the matching matrices indices and acronym match flag are included in the Matching Pairs output type. |
Grouping options
Name bridging prevention |
Prevents records with different forenames being grouped together because they all match to a record that is missing forename. |
Prefix bridging prevention |
Prevents records with ‘Miss’ and ‘Mrs’ being grouped together because they match to a record with ‘Ms’. |
Company bridging prevention |
Prevents records with different company names being grouped together because they all match to a record with an acronym. (e.g. IBM matches “International Business Machines” and “Injection Blow Moulding”). |
Aggressive splitting |
If enabled, bridging records will be disassociated from all matching records. If disable (the default), bridging records will remain matched to one sub-group of non-bridging records. |
Master record identification |
If enabled (default), the master record in each group is chosen according to: Master Priorities rules, then address length, then lowest UniqueRef. If disabled, the master record in each group is simply the record with the lowest UniqueRef. |
Match Keys
Match keys determine how records are clustered. When a new record is added to an existing cluster (containing one or more existing records) the record is compared to each of those existing records. Clusters are used to group potentially matching records.
Keys |
Lists the keys that will be used to cluster records for matching. |
Key types |
Keys are grouped into ‘exact keys’ and ‘fuzzy keys’. All the records in a fuzzy key cluster are compared to one another. All the records in an exact cluster are automatically considered matching, without needing to compare. |
Key fields |
Each key is a combination of key fields, e.g. Address Key + Premise. |
Key functions |
Functions (such as UPPER, TRIM, etc) can be applied to key fields. Functions are best used with raw input data (names, address lines, postcodes, etc.) rather than with the key fields generated by the Hub engine (NameKey, AddressKey, etc.) |
Allow blank keys |
Key fields can be marked as 'optional' by enclosing them within square brackets, alternatively enabling ‘allow blank keys’ makes all key fields optional. The two methods cannot be used at the same time. |
Dynamic keys |
In overlap mode (and lookup mode) enabling this option instructs Matching to dynamically choose which keys to use (from those defined), on a record-by-record basis, depending on which input columns are populated. |
Matching Rules
Matching levels |
The Matching Rules dialog has one tab for each matching level enabled (Individual, Name only, Family, Address, Business, Company only). |
Weights |
Weights are used when compared records are scored. Weights are configured automatically when the basic configuration settings are specified (nationality, tightness), there is no need to manually configure these weights unless customizations are being made. |
Thresholds |
Scoring thresholds can be applied to provide further matching requirements when two records are compared. It is not recommended to change these settings. |
Constraints
Must match gender |
When enabled, potential matches will be disregarded if their genders differ. However, if the gender is unknown in one or both of the records, the records will potentially be classed as a match. |
Must match suffix |
When enabled, potential matches will be disregarded if their suffixes differ. However, if the suffix is unknown in one or both of the records, the records will potentially be classed as a match. |
Must match joint names |
When enabled, potential matches will be disregarded if one record has a joint name but the other doesn't. For example, normal behaviour will match "Mr and Mrs J Smith" with "Mr J Smith"; enabling must match joint names will prevent such matches. |
Address constraints |
The address matching constraints (must match location, premise, directional, etc) are now implemented via post-matching rules, so do not need to be configured here. |
Matching Matrices
Three dimensional matching matrices are used to decide the level of match records should achieve. In the name matching matrix the three dimensions represent the individual name fields: last name, first name, middle name. In the company matching matrix the three dimensions are name1, name2, and name3. The matrix maps the match type for these individual name fields (equal, both_empty, one_empty, sounds_equal, etc.) to an overall match level (sure, likely, possible, etc.).
Post-Matching Rules
Advanced Post-Matching rules are applied to matching pairs prior to grouping. The Advanced Post-Matching rules only apply to fuzzy compared matches. Each rule specifies both a condition using a SQL-like syntax, plus an action that determines what happens when a condition is satisfied.
Conditions |
Rule conditions are logical expressions that results in a Boolean (true or false). An expression can be a function – such as “matches(city)” – or a logical operation such as “AddressScore >= 30”, “City == ‘RALEIGH’”. Conditions can consist of a single logical expression or of multiple expressions (combined using “and”, or “or”). |
Actions |
Rule actions are either "Keep" or "Delete". If any successful rule specifies a Keep action, then the match is kept. If any successful rule specifies a Delete action, then the match is deleted, but only if the match isn’t being kept. |
Master Priorities
Master priority rules are used to determine which record in a matching group should be marked as the master record (i.e. the best record).
Word Lookup
The Names and Words tables (NAMES.DAT & NAMES2.DAT) control:
- the matching equivalent of words e.g. Tony = Anthony
- the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
- casing rules e.g. PO Box, IBM, 360Science
- expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
- attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.
Generate tab
Generate name options
Default gender |
The Default Gender property is the gender to assume when the matchIT API can't determine whether the name is male or female e.g. Chris Smith, C Smith. |
Use equivalent name |
If enabled, the input first name is replaced with its equivalent from the NAMES.DAT file. |
Enhanced double barrelled lookup |
When enabled, this setting will cause an unrecognised middle name to be considered part of a non-hyphenated double-barrelled last name. |
Process blank last name |
With this setting enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors. |
Parse name elements |
When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed. |
Detect inverse names |
Attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, "Smith, John" where Smith is the lastname). |
Parse as normalized name |
Addressee names are assumed to be in a delimited normalized format. |
Generate company options
Use equivalent name |
If enabled, then the equivalent (according to the NAMES.DAT file) of words indicating a business name, such as "Motors" or "Services" are included in the normalized organization name and the corresponding phonetic keys. |
Normalization truncation |
If enabled (non-zero), and the organization consists of more than three words, then the third element of the normalized organization name will be truncated to the first N characters of each word after the first two (where N is the value of this setting). |
Ignore parentheses |
With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the phonetic organization keys. |
Ignore trailing post town |
Exclude any trailing post town from the phonetic organization keys. |
Generate address options
Verify postcode |
If enabled, verifies and corrects the format of the postcode. |
Default street line |
This property is used when the generating a phonetic address key, to indicate the position of the street and the town in the address. |
Lines to scan |
This property enables personal names to be extracted from address lines. It can be set to 1 or 2. |
Premise first |
Indicates whether to expect the premise or flat number to come first in address lines |
Generate options
Drop excluded words |
When enabled, flag any records that contain exclusion words in any of the key fields. |
Consider casing |
When enabled, consider the casing of the incoming data when splitting the data up for extracting keys, proper casing, and so forth. |
Variable keys max length |
Specifies the maximum length of various variable-length phonetic keys. |
Compare tab
Phonetic compare options
Algorithm |
The phonetic algorithm used when scoring. There are five choices available: soundIT, Loose_SoundIT, Dynamic_SoundIT, Soundex, None. |
(Algorithm) for first name |
Optionally specify a different algorithm to compare first names (‘none’ means use the same as the main Algorithm setting. |
Loose threshold |
When the Dynamic_SoundIT algorithm is in use, this property controls the threshold at which soundIT is switched to Loose soundIT. |
Fuzzy compare options
Algorithm |
The fuzzy algorithm used when scoring. There are two choices available: matchIT_Fuzzy, Damerau_Levenshtein. |
Maximum edit distance |
The maximum number of differences between the two strings. (Applicable to Damerau_Levenshtein only) |
Minimum score |
The minimum fuzzy score. (Applicable to Damerau_Levenshtein only) |
Address compare options
Match box number and postcode |
When enabled, two compared addresses score Sure if they contain matching postal box numbers and postcodes. |
User premise range |
When enabled, this will allow addresses to contain premise ranges. |
Loose fuzzy premise match |
When enabled, additional fuzzy premise matching is performed. |
Match delivery points |
When enabled, this will prevent two addresses from matching when both contain two postal codes but different delivery point codes and the addresses score below the minimum threshold. |
Match DP threshold |
See match delivery points, above. |
Default DPSs |
See match delivery points, above. |
Ignore premise suffix |
When enabled, this will allow two premises to match regardless of whether one or both has an apartment- or flat-type suffix (for example, 12 and 12a). |
Name compare options
Prevent mrs matching miss |
When enabled, then two compared names will not match if one has a title of Mrs and the other a title of Miss. |
Fuzzy match non-normalized names |
When enabled (the default), this will cause additional matching checks to be performed on names using the non-normalized name matching fields. |
Blank name company matching |
When two records contain no addressee names, this setting will allow the names to achieve a score depending on what's available in the job title and company name fields. · 0 - Off · 1 - On if either name blank · 2 - On when both names are blank |
Initial match equivalent |
Controls how an initial matches a name that's equivalent to the given firstname. For example, when comparing Rebecca Smith and B Smith, then the B could be considered a match for Becky, which is a common abbreviation (or equivalent) of Rebecca. |
Cross match initial to name |
When enabled (the default), and the first letter of a firstname matches the middle initial (for example, "Richard Smith" and "John R Smith") then the names will be considered a possible match. |
Fuzzy match initials |
Controls how similar-sounding initials (M/N, S/F, and G/J) can be matched. When set to 'full' (the default), then one name's initial is permitted to match the first letter of the other name's firstname (for example, "M Smith" versus "Neil Smith"). When set to 'initialsOnly', then only initials are permitted ("M Smith" versus "N Smith"). A setting of 'noMatch' disables such matches. |
Initial match forename |
Controls the result achieved when an initial matches the first letter of a firstname. This defaults to 'equal', so that B Smith versus Bob Smith will achieve the same result as Bob Smith versus Bob Smith (i.e. 'equal' for the firstnames). Reducing this setting to 'approx' or 'contains' will reduce the resultant name score in order to distinguish such matches. |
Fuzzy match forename |
Used to prevent different recognized firstnames from fuzzy matching. For example, ordinarily Ron and Roy will fuzzy match. |
Resources tab
Threads |
By default - or if 0 is specified for the number of threads - each engine will use all available processor cores. |
Debug log |
Should a process that uses matchIT Hub unexpectedly crash, the engine can be configured to create a debug log of all data loaded and all operations performed on the data. |
Memory usage
Input buffer |
All data added is initially stored in the input buffer. The processing threads remove this data from the input buffer for processing. |
Output buffer |
Similarly, all results are written to the output buffer. The application must remove results from the buffer to prevent it from becoming full. |
Block size |
Every item of data - once it's been removed from the input buffer and acquired by a processing thread - is stored internally in blocks along with other items of data. |
Cache limit |
When a block is full, it's added to the fast cache. When the cache becomes full (if the cacheLimit is not 0) then archiving will begin. |
Threshold |
When the memory usage of the running process exceeds the threshold, blocks will be moved from memory to the temporary disk paging file. |
Compression level |
The compression level can be 0 for disabled, or 1 (fastest compression) to 9 (slowest/best compression). |
Encryption |
The encryption key size can be 128, 192, or 256. Encryption of memory-resident data should not normally be required, but can be enabled if necessary. |
Disk usage
Location |
Specifies the directory in which a temporary disk paging file will be created, should the process's memory usage exceed the threshold (refer to Memory Settings, above). |
Limit |
A nonzero limit can be specified, in which case the process will be aborted should the disk file's size exceed the limit. |
Compression level |
The compression level can be 0 for disabled, or 1 (fastest compression) to 9 (slowest/best compression). |
Encryption |
The encryption key size can be 128, 192, or 256. A key size of 256, for maximum security, is highly recommended. |