Advanced Configuration

Advanced settings are edited in a popup tabbed dialog.

Matching Tab

Minimum scores	Specifies a matching threshold score for each matching level enabled.
Maximum cluster size	All processed data is added to clusters. When a record is added to an existing cluster, it's then compared to each record already in the cluster, provided the maximum cluster size has not been exceeded. If the cluster has reached the maximum size, then no more comparisons will be performed on that cluster and it will be logged as a large cluster.

Output Options

Output unique refs only	If enabled, then only unique refs are output. If disabled (default), then the output contains a copy of the input data, which can include the unique ref.
Output component scores	If enabled, then scores for mapped components are output for each matching level in addition to total scores. If disabled (the default), then only the total score for each matching level is output.
Output exact match scores	If enabled, then a total score is output for exact matches that is the sum of the sure score setting for all mapped components plus one. Otherwise the score field is blank for exact matches. Regardless of this setting the component scores for exact matches are always blank.
Output all exact matches	When disabled (the default), matching pairs are only output if a record exactly matches the first record of a cluster. If enabled, then all matching pairs are output.
Output highest scores	If enabled, highest scores are also output to the Grouped Matching Pairs and Matching Groups output types. This is the highest score achieved by any matching pair within each group. (Conversely, the base score is the lowest score achieved by the pairs within a group and is always output.)
Output duplicates count	If enabled, the number of duplicates in each group is output to the Matching Groups output type. This is one less than the number of records in the group.
Output compare results	If enabled, the matching matrices indices and acronym match flag are included in the Matching Pairs output type.

Grouping Options

Name bridging prevention	Prevents records with different forenames being grouped together because they all match to a record that is missing forename.
Prefix bridging prevention	Prevents records with ‘Miss’ and ‘Mrs’ being grouped together because they match to a record with ‘Ms’.
Company bridging prevention	Prevents records with different company names being grouped together because they all match to a record with an acronym. (e.g. IBM matches “International Business Machines” and “Injection Blow Moulding”).
Aggressive splitting	If enabled, bridging records will be disassociated from all matching records. If disable (the default), bridging records will remain matched to one sub-group of non-bridging records.
Master record identification	If enabled (default), the master record in each group is chosen according to: Master Priorities rules, then address length, then lowest UniqueRef. If disabled, the master record in each group is simply the record with the lowest UniqueRef.

Match Keys

Match keys determine how records are clustered. When a new record is added to an existing cluster (containing one or more existing records) the record is compared to each of those existing records. Clusters are used to group potentially matching records.

Keys	Lists the keys that will be used to cluster records for matching.
Key types	Keys are grouped into ‘exact keys’ and ‘fuzzy keys’. All the records in a fuzzy key cluster are compared to one another. All the records in an exact cluster are automatically considered matching, without needing to compare.
Key fields	Each key is a combination of key fields, e.g. Address Key + Premise.
Key functions	Functions (such as UPPER, TRIM, etc) can be applied to key fields. Functions are best used with raw input data (names, address lines, postcodes, etc.) rather than with the key fields generated by the Hub engine (NameKey, AddressKey, etc.)
Allow blank keys	Key fields can be marked as 'optional' by enclosing them within square brackets, alternatively enabling ‘allow blank keys’ makes all key fields optional. The two methods cannot be used at the same time.
Dynamic keys	In overlap mode (and lookup mode) enabling this option instructs Matching to dynamically choose which keys to use (from those defined), on a record-by-record basis, depending on which input columns are populated.

Matching Rules

Matching levels	The Matching Rules dialog has one tab for each matching level enabled (Individual, Name only, Family, Address, Business, Company only).
Weights	Weights are used when compared records are scored. Weights are configured automatically when the basic configuration settings are specified (nationality, tightness), there is no need to manually configure these weights unless customizations are being made.
Thresholds	Scoring thresholds can be applied to provide further matching requirements when two records are compared. It is not recommended to change these settings.

Constraints

Must match gender	When enabled, potential matches will be disregarded if their genders differ. However, if the gender is unknown in one or both of the records, the records will potentially be classed as a match.
Must match suffix	When enabled, potential matches will be disregarded if their suffixes differ. However, if the suffix is unknown in one or both of the records, the records will potentially be classed as a match.
Must match joint names	When enabled, potential matches will be disregarded if one record has a joint name but the other doesn't. For example, normal behaviour will match "Mr and Mrs J Smith" with "Mr J Smith"; enabling must match joint names will prevent such matches.
Address constraints	The address matching constraints (must match location, premise, directional, etc) are now implemented via post-matching rules, so do not need to be configured here.

Matching Matrices

Three dimensional matching matrices are used to decide the level of match records should achieve. In the name matching matrix the three dimensions represent the individual name fields: last name, first name, middle name. In the company matching matrix the three dimensions are name1, name2, and name3. The matrix maps the match type for these individual name fields (equal, both_empty, one_empty, sounds_equal, etc.) to an overall match level (sure, likely, possible, etc.).

Post-Matching Rules

Advanced Post-Matching rules are applied to matching pairs prior to grouping. The Advanced Post-Matching rules only apply to fuzzy compared matches. Each rule specifies both a condition using a SQL-like syntax, plus an action that determines what happens when a condition is satisfied.

Conditions	Rule conditions are logical expressions that results in a Boolean (true or false). An expression can be a function – such as “matches(city)” – or a logical operation such as “AddressScore >= 30”, “City == ‘RALEIGH’”. Conditions can consist of a single logical expression or of multiple expressions (combined using “and”, or “or”).
Actions	Rule actions are either "Keep" or "Delete". If any successful rule specifies a Keep action, then the match is kept. If any successful rule specifies a Delete action, then the match is deleted, but only if the match isn’t being kept.

Master Priorities

Master priority rules are used to determine which record in a matching group should be marked as the master record (i.e. the best record).

Word Lookup

The Names and Words tables (NAMES.DAT & NAMES2.DAT) control:

the matching equivalent of words e.g. Tony = Anthony
the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
casing rules e.g. PO Box, IBM, 360Science
expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.