Previous Article | matchIT SQL Index | Next Article |
1.1.1 msp_FindOverlap
Input Parameters:
- Configuration file – the file path of the configuration file to be used when this procedure is run.
- Main datasource ID – specifies the data source to be used within the configuration file, which contains the table and column mapping specifications. This datasource will be considered the Main dataSource.
- Overlap datasource ID – specifies the data source to be used within the configuration file, which contains the table and column mapping specifications. This datasource will be considered the Overlap datasource.
The FindOverlap procedure is essentially the same as FindMatches (and allows for the same settings); however the outputted results table contains matching pairs where a record in the update table has matched against a record in the main table.
As the matching process runs, the results are written out to tables within your SQL Server database. The FindOverlap process produces 2 output tables as follows (the names of which can be configured through the Web UI or XML).
Finds all matching record pairs in the specified table. There are quite a few configurable settings for this procedure:
Setting |
Description |
matchKeys->fuzzyKeys |
The match keys that will be used are specified in the XML within the fuzzy keys tags under the match keys section. Fields can be concatenated together to create a match key e.g. <key key1="mkPostOut" key2="mkName1" /> Which means that all records with the same PostOut value and same phonetic surname will be compared.
|
generalSettings->minimumIndividualScore |
The minimum threshold score required for a match to be considered an Individual level match. |
generalSettings->minimumFamilyScore |
The minimum threshold score required for a match to be considered a family level match. |
generalSettings->minimumHouseholdScore |
The minimum threshold score required for a match to be considered a household level match. |
generalSettings->minimumBusinessScore |
The minimum threshold score required for a match to be considered a business level match. |
generalSettings->minimumCustomScore |
You may decide to create your own custom match level. This setting configures the minimum threshold score required for a match to be considered a custom level match. |
generalSettings->preClustering |
Internal to matching algorithm, leave switched on for best performance. |
generalSettings->stripPuncWhenExactMatching |
matchIT SQL will ignore punctuation during exact matches when activated. This setting should only be used when using non match key columns. |
generalSettings->excludeExactMatches |
If this is enabled (the default) and msp_FindExactMatches/Overlap is run prior to msp_FindMatches/Overlap, then the final matches table produced will not include any exact matches. The exact matches already found are excluded to boost the performance of the fuzzy matching step and it is recommended that you use FindExactMatches if you are processing large datasets. Note that the data from the exact_matches table should be appended to the matches table before the relevant grouping and output stored procedures are run, unless Merge Exact matches is enabled |
generalSettings->flagMatchesAtHigherLevels |
If this setting is enabled, then individual level matches will always be marked as family and household level matches. Family level matches will be marked as household level matches. |
outputSettings->matchesTable |
This setting within the configuration allows you to specify the name of the matches table that will be produced. Additionally, for each matching level, you can choose what level of scoring information you would like written into this table. By default, only the total scores for each matching level are enabled, but for example you could add the address component scores by setting the address property to ‘1’. |
outputSettings->largeClustersTable |
This setting within the configuration allows you to specify the name of the table containing the large clusters that will be produced. |
ADVANCED SETTINGS |
The following settings are advanced and most users will not normally need to modify the default settings. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchGender |
When this property is set to True, potential matches will be disregarded if their genders differ. If however the gender is unknown in one or both of the records, the records will potentially be classed as a match. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchSuffix |
When this property is set to True, potential matches will be disregarded if their suffixes differ. If however the suffix is unknown in one or both of the records, the records will potentially be classed as a match. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchLocation |
When this property is set to True, potential matches will be disregarded if their address locations differ. In detail, this means that the postcodes in the two records (if present) must achieve at least a probable match with the address score at least a Possible match, or the address score must be at least a Likely match irrespective of the postcodes, or the postcodes must achieve a Sure match irrespective of the address. This is to prevent false matches where there is some match on address, but where the addresses are clearly not the same, for example "10 High Street, Bookham", and "10 High Street, Alford". Switch this constraint off if you want to match people or companies in different locations; you may want to match on items of data that are independent of location, such as date of birth or bank account. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchPremise |
When this property is set to True, potential matches will be disregarded if their premise numbers differ. If however the premise number is unknown (e.g. one record or both records may contain a premise name), the records will potentially be classed as a match. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->noOneEmptyPremise |
When this property is set to True, potential matches will be disregarded if one of the addresses is missing a premise number. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->allowFuzzyPremiseMatch |
When both this and MustMatchPremise are set to True, then potential matches will be disregarded if the premises are not exact matches (for example, 71 and 71) or if they’re not fuzzy matches (for example 71 and 71A, 45 and 54, or 71 and 7). Note that this property has no effect if MustMatchPremise is set to False because, in that case, fuzzy premises are always allowed. Also note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchDirectional |
When this property is set to True, potential matches will be disregarded if both addresses (i.e. typically US) have a pre- or post-directional (e.g. N, North, E, etc.) but they don’t match. For example, with this constraint enabled, “N Washington Ave” and “S Washington Ave” will not be matched. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchNumericStreetName |
When this property is set to True, potential matches will be disregarded if both addresses (i.e. typically US) have a numeric street name but they don’t match. For example, with this constraint enabled, “5th Ave” and “15th Ave” will not be matched. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchJointNames |
When this property is set to True, potential matches will be disregarded if one record has a joint name but the other doesn’t. For example, normal behavior will match “Mr & Mrs J Smith” with “Mr J Smith”; setting this property to True will prevent such matches. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->mustMatchBuilding |
When this property is set to True, potential matches will be disregarded if their building names differ. If however one or both addresses do not contain a building name, the records will potentially be classed as a match. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->constraints->noOneEmptyBuilding |
When this property is set to True, potential matches will be disregarded if one of the addresses is missing a building name. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->name |
Defines the scores produced when names are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->organization |
Defines the scores produced when organizations are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->address |
Defines the scores produced when address fields are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->postcode |
Defines the scores produced when postcode/zip fields are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->telephone |
Defines the scores produced when telephones are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->email |
Defines the scores produced when emails are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->dateOfBirth |
Defines the scores produced when date of birth fields are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->weights->customField1 |
Defines the scores produced when fields defined as customField are compared. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->nameMatchingMatrix |
The location of the name matching matrix. Note that this setting is also available for Family, Household, Business and Custom match levels. |
matchITAPISettings->matchingRules->individualLevel->organizationMatchingMatrix |
The location of the Organization Matching Matrix. Note that this setting is also available for Family, Household, Business and Custom match levels. |
Previous Article | matchIT SQL Index | Next Article |