Find Overlap is for matching two separate data sources. So you would be setting up two generate keys, pointing to your two separate tables.
First, let's go over the difference between an overlap vs. an internal dedupe (the findmatches/groupmatches tasks) as this is a common point of confusion.
When you’re matching internally (findmatches), you have these 4 records for example
1 |
W R Dayton Jr |
Palmer Air Charters |
Suite 106 |
7350 Airport Rd |
Wilmington |
28401-4273 |
2 |
Bill Deighton |
PAC |
735 Airport Rd #106 |
Wilmington |
NC |
28401 |
3 |
Mr. W.Datyon |
Palmr Air |
7350 Airport Road |
Ste 106 |
Wilmington |
28410 |
4 |
William Deighton |
Palmer Air Charters, Inc. |
7350 Airport Rd |
Wilmington |
NC |
|
These 4 records may not link directly, but if we know 1 matches 2, and 2 matches 3, and 3 matches 4, then we can determine that all four go together. And when we link them we give a base score – which is the weakest link in the set. When you’re doing an internal dedupe its all about gaining that single customer view.
Now that is good for an internal dedupe.
Although when overlapping (findoverlap) it’s a different approach, as opposed to linking records together and choosing the weakest link as the baseline, you want to instead choose the best match
So instead let's say we had two tables
Table1 lets say is your existing customer table, and has 3 separate records (this may indicate that it should be deduped, but that may be a separate project)
W R Dayton Jr |
Palmer Air Charters |
Suite 106 |
7350 Airport Rd |
|
Bill Deighton |
PAC |
735 Airport Rd #106 |
Wilmington |
NC |
Mr. W.Datyon |
Palmr Air |
7350 Airport Road |
Ste 106 |
Wilmington |
And we have table 2 – lets say this is a feed of new sales
William Deighton | Palmer Air Charters, Inc. | 7350 Airport Rd | Wilmington | |
John smith | LeDroit Park | 416 FLORIDA AVE NW | WASHINGTON, DC 20001 | |
Smith, John | Park Office at ledroit | 416 FLORIDA AVENUE NORTHWEST | WASHINGTON, DC | 20001 |
Now, we can only link that incoming record to one of our three existing customers.
So instead what we’ll do is findoverlap , it will compare the 3 customer records against our incoming table. In this case – we may have found 2 or 3 separate matches, but in this case – it’s about picking the best match, that way we don’t make a new customer record when we shouldn’t.
But we still have the issue that we’re attempting to link 3 new sales, but only caught 1. So John Smith placed 2 orders, but really should only have 1 account. So, in this case, you should take the records that were not matched in your overlap, and possibly run them through a dedupe internally as well. That way we only create one new customer account, instead of 2.
You shouldn’t rely on just overlap, or just findmatches necessarily. Ideally a combination of both will get you the best end result. It's just a matter of figuring out the order that works best for you.
We do recommend before running an overlap, that at least one of the tables would ideally be deduped internally. Otherwise, you can end up with many to many relationships