Previous Article | matchIT Hub Index | Next Article |
Constraints and weights for each matching level, used when compared records are scored.
Default values are shown, configured for matching US data. (See below for details on weights when matching UK and other international data.) Note that weights are configured automatically when the nationality is specified, there's no need to manually configure these weights unless customizations are being made.
Location: <settings><advanced><matchingRules>
<matchingRules>
<individualLevel>...</individualLevel>
<familyLevel>...</familyLevel>
<addressLevel>...</addressLevel>
<businessLevel>...</businessLevel>
<customLevel>...</customLevel>
</matchingRules>
Individual Level
Location: <settings><advanced><matchingRules><individualLevel>
<individualLevel>
<constraints>
<mustMatchGender>true</mustMatchGender>
<mustMatchSuffix>false</mustMatchSuffix>
<mustMatchLocation>true</mustMatchLocation>
<mustMatchPremise>false</mustMatchPremise>
<noOneEmptyPremise>false</noOneEmptyPremise>
<allowFuzzyPremiseMatch>false</allowFuzzyPremiseMatch>
<mustMatchDirectional>false</mustMatchDirectional>
<mustMatchNumericStreetName>false</mustMatchNumericStreetName>
<mustMatchJointNames>false</mustMatchJointNames>
<mustMatchBuilding>false</mustMatchBuilding>
<noOneEmptyBuilding>false</noOneEmptyBuilding>
</constraints>
<weights>
<name sure="60" likely="40" possible="25" oneEmpty="5" bothEmpty="24" />
<organization sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<address sure="40" likely="30" possible="20" oneEmpty="5" bothEmpty="5" />
<postcode sure="30" likely="20" possible="15" oneEmpty="5" bothEmpty="5" />
<telephone sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<email sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<dateOfBirth sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<customField1 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
...
<customField9 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
</weights>
<thresholds>
<name>0</name>
<organization>0</organization>
<address>0</address>
<postcode>0</postcode>
<telephone>0</telephone>
<email>0</email>
<dateOfBirth>0</dateOfBirth>
<customField1>0</customField1>
...
<customField9>0</customField9>
</thresholds>
<nameMatchingMatrix>...</nameMatchingMatrix>
<organizationMatchingMatrix>...</organizationMatchingMatrix>
</individualLevel>
Constraints
mustMatchGender: When this property is set to True, potential matches will be disregarded if their genders differ. If however the gender is unknown in one or both of the records, the records will potentially be classed as a match.
mustMatchSuffix: When this property is set to True, potential matches will be disregarded if their suffixes differ. If however the suffix is unknown in one or both of the records, the records will potentially be classed as a match.
mustMatchLocation: When this property is set to True, potential matches will be disregarded if their address locations differ. In detail, this means that the postcodes in the two records (if present) must achieve at least a probable match with the address score at least a Possible match, or the address score must be at least a Likely match irrespective of the postcodes, or the postcodes must achieve a Sure match irrespective of the address. This is to prevent false matches where there is some match on address, but where the addresses are clearly not the same, for example, "10 High Street, Bookham", and "10 High Street, Alford". Switch this constraint off if you want to match people or companies in different locations; you may want to match on items of data that are independent of location, such as date of birth or bank account.
mustMatchPremise: When this property is set to True, potential matches will be disregarded if their premise numbers differ. If however the premise number is unknown (e.g. one record or both records may contain a premise name), the records will potentially be classed as a match.
noOneEmptyPremise: When this property is set to True, potential matches will be disregarded if one of the addresses is missing a premise number.
allowFuzzyPremiseMatch: When both this and MustMatchPremise are set to True, then potential matches will be disregarded if the premises are not exact matches (for example, 71 and 71) or if they're not fuzzy matches (for example 71 and 71A, 45 and 54, or 71 and 7). Note that this property has no effect if MustMatchPremise is set to False because, in that case, fuzzy premises are always allowed.
mustMatchDirectional: When this property is set to True, potential matches will be disregarded if both addresses (i.e. typically US) have a pre- or post-directional (e.g. N, North, E, etc.) but they don't match. For example, with this constraint enabled, "N Washington Ave" and "S Washington Ave" will not be matched.
mustMatchNumericStreetName: When this property is set to True, potential matches will be disregarded if both addresses (i.e. typically US) have a numeric street name but they don't match. For example, with this constraint enabled, "5th Ave" and "15th Ave" will not be matched.
mustMatchJointNames: When this property is set to True, potential matches will be disregarded if one record has a joint name but the other doesn't. For example, normal behavior will match "Mr and Mrs J Smith" with "Mr J Smith"; setting this property to True will prevent such matches.
mustMatchBuilding: When this property is set to True, potential matches will be disregarded if their building names differ. If however one or both addresses do not contain a building name, the records will potentially be classed as a match.
noOneEmptyBuilding: When this property is set to True, potential matches will be disregarded if one of the addresses is missing a building name.
Weights
A total score is calculated when two records are compared; this is the sum of the scores generated for each component within the two records - i.e. name, organization, address, postcode, etc.
In essence, identical or equivalent components are scored using the Sure weight, very similar components the Likely weight, and less similar components the Possible weight; unmatched components are scored as 0. The OneEmpty and BothEmpty weights are used when one or both records contain no data for the component.
Scores are generated for names and organizations via the Matching Matrices. Scores for all other components are generated by proprietary scoring algorithms.
Thresholds
Scoring thresholds can be applied to provide further matching requirements when two records are compared.
Essentially, the total score begins at 0. As the score for each component is added to the total score, the total score must match or exceed the component's specified threshold. If the total score is lower than a threshold, then the two records are rejected as a match and are scored as 0.
As an example, suppose we're finding duplicates within data containing individuals' names, addresses, and postcodes. With an initial total score of 0, the first component (name) is added to the total. The total score must exceed the name threshold. Using a threshold of 25 ensures that the names within all matching pairs score at least Possible (using default weights). The next component (organization) is then added to the total score, and the threshold checked; a value of 0 (the default) effectively disables the threshold. The next component (address) is then added to the total score; a value of 55 requires that the address match or exceed a score of 30 - because we've already established that the name scores 25 or more. Assuming the total score is 55 or more, we then proceed to the next component and its threshold. Each subsequent component's score is added to the total score, and its threshold checked each time.
It is important to note that the thresholds are applied in the order shown above (i.e. name, then organization, then address, and so on down to customField9) even if their order appears different in the settings XML.
Family Level
Location: <settings><advanced><matchingRules><familyLevel>
<familyLevel>
<constraints>
<mustMatchGender>false</mustMatchGender>
<mustMatchSuffix>false</mustMatchSuffix>
<mustMatchLocation>true</mustMatchLocation>
<mustMatchPremise>false</mustMatchPremise>
<noOneEmptyPremise>false</noOneEmptyPremise>
<allowFuzzyPremiseMatch>false</allowFuzzyPremiseMatch>
<mustMatchDirectional>false</mustMatchDirectional>
<mustMatchNumericStreetName>false</mustMatchNumericStreetName>
<mustMatchJointNames>false</mustMatchJointNames>
<mustMatchBuilding>false</mustMatchBuilding>
<noOneEmptyBuilding>false</noOneEmptyBuilding>
</constraints>
<weights>
<name sure="60" likely="40" possible="25" oneEmpty="5" bothEmpty="24" />
<organization sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<address sure="40" likely="30" possible="20" oneEmpty="5" bothEmpty="5" />
<postcode sure="30" likely="20" possible="15" oneEmpty="5" bothEmpty="5" />
<telephone sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<email sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<dateOfBirth sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<customField1 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
...
<customField9 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
</weights>
<thresholds>
...
</thresholds>
<nameMatchingMatrix>...</nameMatchingMatrix>
<organizationMatchingMatrix>...</organizationMatchingMatrix>
</familyLevel>
Refer to the Individual Level for further details on the constraints, weights, and thresholds.
Address Level
Location: <settings><advanced><matchingRules><addressLevel>
<addressLevel>
<constraints>
<mustMatchGender>false</mustMatchGender>
<mustMatchSuffix>false</mustMatchSuffix>
<mustMatchLocation>true</mustMatchLocation>
<mustMatchPremise>true</mustMatchPremise>
<noOneEmptyPremise>false</noOneEmptyPremise>
<allowFuzzyPremiseMatch>false</allowFuzzyPremiseMatch>
<mustMatchDirectional>true</mustMatchDirectional>
<mustMatchNumericStreetName>true</mustMatchNumericStreetName>
<mustMatchJointNames>false</mustMatchJointNames>
<mustMatchBuilding>true</mustMatchBuilding>
<noOneEmptyBuilding>true</noOneEmptyBuilding>
</constraints>
<weights>
<name sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<organization sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<address sure="40" likely="30" possible="20" oneEmpty="5" bothEmpty="5" />
<postcode sure="30" likely="20" possible="15" oneEmpty="5" bothEmpty="5" />
<telephone sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<email sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<dateOfBirth sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<customField1 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
...
<customField9 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
</weights>
<thresholds>
...
</thresholds>
<nameMatchingMatrix>...</nameMatchingMatrix>
<organizationMatchingMatrix>...</organizationMatchingMatrix>
</addressLevel>
Refer to the Individual Level for further details on the constraints, weights, and thresholds.
Business Level
Location: <settings><advanced><matchingRules><businessLevel>
<businessLevel>
<constraints>
<mustMatchGender>false</mustMatchGender>
<mustMatchSuffix>false</mustMatchSuffix>
<mustMatchLocation>true</mustMatchLocation>
<mustMatchPremise>false</mustMatchPremise>
<noOneEmptyPremise>false</noOneEmptyPremise>
<allowFuzzyPremiseMatch>false</allowFuzzyPremiseMatch>
<mustMatchDirectional>false</mustMatchDirectional>
<mustMatchNumericStreetName>false</mustMatchNumericStreetName>
<mustMatchJointNames>false</mustMatchJointNames>
<mustMatchBuilding>false</mustMatchBuilding>
<noOneEmptyBuilding>false</noOneEmptyBuilding>
</constraints>
<weights>
<name sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<organization sure="60" likely="40" possible="25" oneEmpty="15" bothEmpty="25" />
<address sure="40" likely="30" possible="20" oneEmpty="5" bothEmpty="5" />
<postcode sure="30" likely="20" possible="15" oneEmpty="5" bothEmpty="5" />
<telephone sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<email sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<dateOfBirth sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<customField1 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
...
<customField9 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
</weights>
<thresholds>
...
</thresholds>
<nameMatchingMatrix>...</nameMatchingMatrix>
<organizationMatchingMatrix>...</organizationMatchingMatrix>
</businessLevel>
Refer to the Individual Level for further details on the constraints, weights, and thresholds.
Custom Level
Location: <settings><advanced><matchingRules><customLevel>
<customLevel>
<constraints>
<mustMatchGender>false</mustMatchGender>
<mustMatchSuffix>false</mustMatchSuffix>
<mustMatchLocation>false</mustMatchLocation>
<mustMatchPremise>false</mustMatchPremise>
<noOneEmptyPremise>false</noOneEmptyPremise>
<allowFuzzyPremiseMatch>false</allowFuzzyPremiseMatch>
<mustMatchDirectional>false</mustMatchDirectional>
<mustMatchNumericStreetName>false</mustMatchNumericStreetName>
<mustMatchJointNames>false</mustMatchJointNames>
<mustMatchBuilding>false</mustMatchBuilding>
<noOneEmptyBuilding>false</noOneEmptyBuilding>
</constraints>
<weights>
<name sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<organization sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<address sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<postcode sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<telephone sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<email sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<dateOfBirth sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
<customField1 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
...
<customField9 sure="0" likely="0" possible="0" oneEmpty="0" bothEmpty="0" />
</weights>
<thresholds>
...
</thresholds>
<nameMatchingMatrix>...</nameMatchingMatrix>
<organizationMatchingMatrix>...</organizationMatchingMatrix>
</customLevel>
Refer to the Individual Level for further details on the constraints, weights, and thresholds.
Weights for UK and international data
The following weights are used in place of the above when the nationality is not 'USA':
<weights>
<name sure="60" likely="40" possible="25" oneEmpty="15" bothEmpty="25" />
<address sure="30" likely="22" possible="15" oneEmpty="5" bothEmpty="5" />
<postcode sure="30" likely="20" possible="15" oneEmpty="10" bothEmpty="10" />
...
</weights>
Note that all other weights are unchanged.
Previous Article | matchIT Hub Index | Next Article |