Fact #5: MatchKeys and Matchcodes - Everyone hates them!
Your 'MatchKeys' are black-and-white. Your data is not!
Conventional matching processes use a library of algorithms like Soundex, Metaphone, and Levenshtein, and require significant data wrangling to extract, transform, standardize and normalize data prior to matching. The algorithms must then be folded into substring matchkeys to find potential fuzzy and phonetic matches. It’s a long iterative process of trial and error, playing with various algorithms and matchcodes just to figure out how to get ‘adequate’ results.
If you think Your ‘Matchcode’ is going to save you - You’d be mistaken!
Nearly every database provider, homegrown process, ETL platform, MDM and Data Quality vendor on the market uses the same process and methods for contact data matching. Substring matchkeys.
It’s an approach advanced in the 1980’s to make the best of algorithms that are inherently flawed. It doesn't fully account for the realities of customer data. It doesn't work for efficiency and it doesn’t work at scale.
For non-technical readers - What is a Matchkey?
Conventional matchkeys serve two purposes - speed and injecting “fuzziness” into search
Conventional algorithms are too limited in their ability to find matches, so they rely on Matchkeys and a Matchcode. Matchkeys add fuzziness to the search, and they allow for the identification of records that have some amount of variation between each other.
The second reason is simply “time”. Data matching is a computationally resource intensive task. Even with a small database of 10,000 records, it would require roughly 50-Million comparisons (10,000*5,000) to compare every single record in a database. By using match keys and a process called ‘blocking’ or ‘grouping’ or ’clustering’ the number of comparisons is greatly reduced.
A Matchcode is the rule that defines which data fields (and data transformations) are used to create the matchkey.
The Matchkey (defined by the matchcode) is a single alphanumeric string of data extracted from specifically defined fields in each contact record, and then used to compare records.
It’s a value that says, if we compare elements of fields x,y and z and those values are the same, the records are a ‘possible’ matching pair of records. A matchkey may be an exact or a fuzzy match key as determined by the matching rule (aka matchcode).
In its most basic form, as we illustrated earlier, a match key might look like this…