What match keys and should I use?
It can depend on the level of matching you want (one per person, company, address, etc.) and the nature of your data and your requirements. You can use any fields as match keys, but as a general rule, you should select more than one key (which will usually include at least one compound key) and there shouldn’t be a common element to all the keys. For example, you should make sure that the phonetic surname key (mkname1) is not contained in every match key that you use, to help detect duplication no matter what data may be missing or inconsistent.
The variations by volume stem from the fact that the greater the volume of records in your database, the larger the groups of records with the same match key and the longer it takes to compare candidate matches. The point at which you should use the keys recommended for high database volumes depends on the nature of your data and your requirement. For example, matching data keyed for data entry against a million geographically widespread records already on the database may give perfectly good response using the “medium volume” keys, whereas batch dedupe of half a million records all from one local area may take too long unless you use the “high volume” keys.
If you have additional fields such as Email/Telephone, we would suggest you start out with these keys, then you can add them as additional keys and put a weight on them. You can use keys containing fields which have not been standardised by the matchIT API, but these are not usually as reliable for finding matches as those which have been standardised.
If your file contains lots of exact duplicates, you may want to run exact matching first. If your file contains very complex or badly structured data, you may need to use different keys – please contact your supplier if you would like advice on this. When using keys which differ from the recommendations, it is a good idea to test what extra matches may be picked up by the recommended keys, to ensure that the keys you want to use are effective.
For more detail - please see
Suggested match keys - US
Suggested match keys - UK
What matching weights should I use for my data?
Again, this is dependent on the data, but the default weights that the matchIT API allocates are adequate for most purposes:
- Residential (one record per person or family)
Description |
Name |
Address |
Postcode |
Sure |
60 |
30 (UK) 40 (US) |
30 |
Likely |
40 |
22 (UK) 30 (US) |
20 |
Possible |
20 |
15 (UK) 20 (US) |
15 |
One Empty |
15 |
5 |
10 |
Both Empty |
25 |
5 |
10 |
The different Address weights for UK (or high level postal codes) and US (or low level postal codes) stem from the fact that the Sure weight on Postcode is reserved for postcodes that are (as far as can be determined) at street or lower level. For example, the Sure weight will only be allocated to postcodes that contain both a post out and a post in and are equal, whereas lower scores will be applied when there are fuzzy and/or incomplete.
- Residential (one record per address – household matching)
As for one record per person or family, but with zero weights on Name.
- Business (one record per person or contact)
This is the same as for Personal Matching, as by default the matchIT API ignores company names when matching business contacts – this is because it is very common for companies to change their name, individuals to work for more than one company in the same group etc. Of course, you can choose to put a weight on the company name as well, if you wish.
- Business (one record per company)
As for one record per person or family, but with the same weights on Organization Name instead of Name.