msp_GroupMatches
Input Parameters:
- Configuration file – the file path of the configuration file to be used when this procedure is run.
- Datasource ID – specifies the data source to be used within the configuration file, which contains the database connection string.
- Level – Matching Level at which to group the records (Individual, Family, Household, Business, Custom).
After running msp_FindMatches, this will group all matching record pairs into sets of matching records. If you have previously run FindExactMatches, then we recommend that you copy the matching pairs data from your exact_matches table into you matches table prior to running this grouping step, unless Merge Exact Matches is enabled.
Setting |
Description |
generalSettings->preventBridgedMatches |
When enabled mSQL will attempt to stop match groups containing bridged records such as: J Smith John Smith Julian Smith In the scenario above, both John Smith and Julian Smith match with J Smith, but not with each other. |
generalSettings->masterRecordIdentification |
When this setting is active, mSQL will use the MasterPriority matrix to determine which record in a matching group should be marked as the master record (i.e. the best record). When this setting is off, the record with the lowest unique_reference will be chosen as the master record. |
dataSources->ConnectionString |
Connection string used to connect to the database during processing. |
outputSettings->groupedMatchesTable |
Name of the group matches output table that will be produced during the processing of this procedure. |
outputSettings->matchesTable |
Name of the matches table containing the matching pairs that will be used as the input source for this procedure. |
generalSettings->excludeExactMatches |
The merge attribute of this setting indicates whether to merge all exact matches, only those for the grouping level being used, or none. The records are then merged prior to the grouping. |
Master Record Identification
As part of the grouping process, mSQL will intelligently choose the master record for a matching group based on the data contained in the records in the matching set. The record with the best data is designated as the master record for the matching group.
mSQL uses a master priority table to determine how to score the quality of the data held within the matching records. The default location of this file is as follows:
- C:\mSQL\config\masterPriorities\default.xml
The table itself dictates scoring rules for each type of matching field. For example, by default if the phonetic last name field is empty (mkName1), then the master priority score will have 99 points subtracted from it. The overall score is the sum of the scores for each field listed in the matrix.
Note that you can also add Customfields (if you have mapped any in your datasources) into the matrix for example consider the following rule:
- <rule field="CustomField1" test="value" pos="8" operation="equal" value="X" score="-33" /> - this rule means that when the fields mapped as CustomField1 is scored, if the value of the character in position 8 contains an X, then the total master priority score will have 33 subtracted from it.
Note that when testing the length or value of a field, the following operations are permitted:
- equal
- notEqual
- greater
- notGreater
- less
- notLess
In the event that you have 2 records both scoring the same master priority score, and both having the highest score in the matching group, then the record with the biggest address length (i.e. number of characters in the address fields) will be marked as the master record.
The record in the matching group with the highest master priority score will be designated as the Master Record.
Matches_grouped table structure
During processing the stored procedure will output the results to the matches_grouped table (this name can be configured – see above). The structure of the output table is as follows:
Column |
Description |
ID |
Record ID for each matching group. |
Record1 |
Reference ID of the first record in the matching pair. |
Record2 |
Reference ID of the record that is deemed to match Record1 From the first datasource. |
Score |
The Score column is copied from the relevant level’s total score (grouping can only take place on one matching level; to group using multiple levels requires multiple runs of GroupMatches/GroupOverlap). |
MatchRef |
Indicates the unique reference of the master record in the group. If Master Record Identification is enabled, then this will indicate the ‘best’ record in the group; if not, then the MatchRef will simply be set to the lowest unique reference of all the records in the group (note that the column is a char column, not an integer column, so a unique ref of 100 will be deemed ‘lower’ than 20 because a left-to-right character-based comparison is used). |
BaseScore |
Indicates the lowest score of all the matches in the group, or, if blank, this will indicate a merged exact match |
This table lists the records from the matches table after they have been placed into groups of matches. For example, if record A matches B and B matches C, then all three records will be placed into the same group.