To find matching records in a Main File, the following steps are used:
- Import the data. The simplest method is using the "Setup Wizard". If you are not using the Setup Wizard, select Restore Standard Parameters from the dialog which you see when you select Import, and choose the appropriate matching configuration before you Import the data – see Importing without using the Setup Wizard. If you are importing through the Setup Wizard, at the end of the Setup Wizard, tick the checkbox for "Create Match Keys" and choose the appropriate matching level in the "Select matching level" radial.
|
· Utilize the Find Matches option either at the end of the Setup Wizard, from the Matching menu, the toolbar, or from the dialog at the end of Import. · Choose the Match Keys to use (unless you chose Find Matches at the end of the Setup Wizard, in which case default Match Keys are used).
|
Matching Parameters
When importing data through the Setup Wizard, mDesktop asks what level of matching is to be used. The choices offered depend upon whether the data is business or personal.
The parameters can also be modified for specific requirements. To change these, either select Save/Restore Setup (see "Multiple Parameter Sets") or select "Options" from the Jobs/Setup main menu and change the "Matching Setup - see Online help for more information".
The Setup Wizard allows you to choose between contact and business level matching, only if there is a field named "Company" in your Main File.
Residential Data
Individual Level
This will find matches at an individual level e.g. John Smith and Mary Smith living at the same address will not be matched, nor will John Smith and James Smith. However, John Smith and Mr J E Smith will be matched and (by default) John Smith and E J Smith will be regarded as a possible match, as perhaps E J Smith is known by his middle name.
Family Level
This finds matches on surname at the same address e.g. John and Mary Smith at the same address will be matched, as will John Smith and James Smith, and all but one record will be flagged.
Household Level
This matches records with the same address, regardless of surname e.g. John Smith and Lucy Jones living at the same address will be matched.
Business Data
Contact Level
At contact level, deduplication is performed down to one record per person at a location. This is effectively the same as individual level matching for residential data, as by default mDesktop ignores the company name for contact level matching – this tends to work better because company names change so much, different companies within the same group often have employees in common etc. For example, John Smith and Mr J E Smith will be matched, even if one is at British Steel plc and the other at Corus, as long as the addresses and zip codes match well enough. However, John Smith and Fred Brown at the same address/zip code will not be matched
Business Level
This level is used to produce one record per company or business. Therefore, two different employees working for the same company will be matched, as long as the addresses and zip codes match well enough. Once you have selected to continue with Business Level matching, you will be prompted for Loose (the default), Tight or Legal business matching. These options are illustrated in the window prompt.
Address Level
This matches records with the same address, regardless of surname or business name. All contacts and businesses at the same address will be matched when using this level of deduplication.
Match Keys define which fields mDesktop uses when looking for candidate matching pairs in your Main File – it is important that you understand what this means, so please refer to the definition of Match Keys in Introduction to Matching if you are unsure.
At the top of the screen is a scrollable list of Match Keys called the Fields/Key list. This contains both suggested Match Keys and all fields in the Main File, with the keys that you are most likely to need displayed at the top of the list.
Use the Basic or Advanced option to display these keys in plain English or as field names from the Main File. In their plain English description, keys detailed in proper case have been standardized by mDesktop, ones entirely in lower case have not been standardized. It is better to use the standardized fields, but you could construct your own keys from non-standardized fields, in addition to using mDesktop's generated keys.
The simplest way to select Match Keys is to check the "Use default keys?" box on the Match Keys screen. This will select a standard set of keys which are suitable for most jobs – as supplied on initial installation, they are suitable for matching at individual, family, contact or business level. The keys are suitable for data that is either virtually all US or is all non-US. For household matching, the default keys will work okay but are not ideal, as two of the supplied match keys feature phonetic surname and only one does not. Therefore, for household matching we recommend that you select from the first few keys in the List of Fields/Keys Available as follows:
- two keys that do not feature phonetic surname e.g. for US data:
Zip code
Address Key
or (for non-US data):
Address Key
First 5 characters of ZIP code + Phonetic Key of Street.
- one key that does feature surname to pick up extra household matches where the surname is the same but for some reason the household match is too fuzzy to be picked up by either of the first two keys e.g. for US data:
Phonetic Surname Key + First 5 characters of ZIP code
or (for non-US data):
Phonetic Surname Key + Phonetic Town/City Key.
The set of default keys provided with mDesktop may be overwritten by checking the box "Save as default keys?", but don't do this unless you have a thorough understanding of how match keys work.
Selecting Specific Keys
To select your own keys:
- Click "Clear Keys" to clear all currently chosen keys, if required.
- Highlight the field you wish to use in the Field/Key list and click "Select Key" or double click and this field will be added to the key list.
- To add another field into this key, highlight the second field to be used and click "Select Key" again – in this way, you can build compound keys such as Zip code + Phonetic Surname Key.
- Click "New Key" to add another and repeat from step 2.
If you make a mistake, just click on the key that is wrong (in the "Keys Chosen For Matching" box) and click "Remove Key".
"Verify Keys" will check your chosen keys for you and suggest any problems which may occur with using those keys.
New Analysis
If you are working with a Main File that has already been analyzed for matches, you may not wish to lose any previous matching results. If so, uncheck the box "Is this a new analysis?"
Matching information is recorded in the Perform (short for "performance") database. The Matching or Overlap Summary shows information from this database, which you see automatically at the end of the analysis, or you can view it from the Matching menu. The information displayed is as follows:
Database in use: the database you are using.
Number of records: the total number of records in your database.
Number of matching pairs: the number of pairs of potentially duplicate records that mDesktop has found.
Number of potential deletions: the number of records mDesktop will flag, provided you accept the results and flag at the default threshold score. This is often different to the Number Of Matches Found. This is because if you have more than two records which match each other, say 1, 2 and 3, then record 1 matches record 2 and record 1 matches record 3, but also record 2 matches record 3. This gives 3 matching pairs, but only 2 records will be flagged. This exponentially increases for larger match sets: 10 records in a match set means 45 matching pairs, but only 9 potential deletions!
Matches found by match key
Run number: When mDesktop does its Find Matches routine it makes several passes (or runs) over the database, on each of the chosen Match Keys.
Records compared: the number of records mDesktop has read for each match key. This may be less than the number of records in the Main File if there are records with blank keys, or a Start Range was specified.
Matches found: the number of matching pairs found by each key.
Match Keys used: the Match Keys used since the New Analysis box was last set On.
Matches found by score range
Score range: This shows ranges of match scores depending on the Minimum Score To Report and the maximum possible score.
Number of matches in range: This is the number of pairs of matches found in each score range.
After mDesktop has performed the Find Matches step, the following screen is displayed:
From here, the matches found can be viewed, verified or flagged. These options can also be accessed from the Matching menu. View Matches is used to simply produce a report listing the matches found whereas Verify Matches allows matches to be flagged interactively, declared false matches, and/or allows records to be modified. Flag Matches removes all duplicates scoring above a threshold match score, without further user intervention.
View Matches Window
Selecting View Matches displays another window, as shown:
The most useful options are those in the top half of this screen:
There is one run for each matching key selected. |
· If the Report Grouping option is set to Pairs, then it is possible to view matching pairs that were found by a specific key by unchecking the All Runs option and entering the number that corresponds to that specific key. Alternatively, to view all matching pairs, check All Runs. If the Report Grouping option is changed to Sets, the All Runs option will automatically be checked, as it is not possible to view sets from a specific key. · Matching records can be grouped in pairs (e.g. John Smith, Mr J Smith and J Smith Esq at the same address will be shown as three pairs of matches), or in sets where all records matching each other are displayed together i.e. the three Mr Smiths above would be shown as one set of three records. |
Potential duplicates are given a score based on how well they match: the higher the score, the closer the match |
· The range of matching scores for which the results should be displayed is also controllable and (if viewing in Pairs) records can be sorted by score. Set the score sample size to e.g. 10 if you wish to see just the first 10 pairs for each match score. Selecting this automatically sorts the records by score and displays them in pairs · The report format will default to Business or Residential, dependent upon whether there is a company field in the Main File. (See "Matching") You can choose your own report layout from here and also Verify Matches (see next section). |
If you want to print the report to a printer other than the default printer, or to print more than one copy or a page range, you should select a Destination of Printer, rather than print from the Print Preview. |
· Destination can be either: · Preview for a print preview, · Printer to print the report, · File output to a text file or · PDF to save a PDF version of the report. |
- The results from all runs (since the New Analysis box was checked) can be viewed, or one run number can be selected by unchecking the "All runs?" box. The default run number is that of the last matching run performed.
- Matching records can be grouped in pairs (e.g. John Smith, Mr J Smith and J Smith Esq at the same address will be shown as three pairs of matches), or in sets where all records matching each other are displayed together i.e. the three Mr Smiths above would be shown as one set of three records.
- The range of matching scores for which the results should be displayed is also controllable and (if viewing in Pairs) records can be sorted by score. Set the score sample size to e.g. 10 if you wish to see just the first 10 pairs for each match score. Selecting this automatically sorts the records by score and displays them in pairs
- The report format will default to Business or Residential, dependent upon whether there is a company field in the Main File. (See "Matching") You can choose your own report layout from here and also Verify Matches (see next section). The default reports used for business/residential matching are BIZPAIRS or BIZSETS.FRX and RESPAIRS or RESSETS.FRX respectively.
- Destination can be either:
- Preview for a print preview,
- Printer to print the report,
- File output to a text file or
- PDF to save a PDF version of the report.
Advanced Options
Verify Matches provides an easy way of checking the quality of the matches found, even if you want to flag matches globally. Selecting Verify Matches from the Matching menu or other dialog shows all matches in order of score, lowest score first. Verify Matches is also available as a "Report Format" from the View Matches screen, where you can specify which runs to view, the score range and score sample size.
The Verify Matches option within mDesktop has the ability to display matching records in either a pairs view (default) or a sets view. To change the default view for the Verify Matches option you will need to relabel a file named "NotVerifySetsFromWizard.txt" in the mDesktop directory (C:\Program Files (x86)\mDesktop). Essentially the file should be named "NotVerifySetsFromWizard.txt" if you wish to examine records in the pairs view. However, if you wish to look at records in a sets view the file needs to be renamed to "VerifySetsFromWizard.txt".
For global deletion, see "Flag Matches". The default screen used for pairs of matches is shown below:
Verify Matches in Pairs
This screen will display potential duplicates, in pairs. The different colored highlighting is used to help distinguish where the differing fields are in the records. Fields in red are different, yellow shows information which is contained in the same field in the other record and white displays those fields which are identical.
The buttons in the top right part of the screen control movement through the table.
|
goes back to the first pair displayed (lowest matching score selected). |
|
goes back to the previous pair displayed. |
Keyboard shortcut keys for next pair and previous pair are Page Up and Page Down. |
goes forward to the next pair (which may or may not have the same score as the current pair displayed). |
|
jumps to the next pair with a different match score. If you are viewing the records in order of score (the default), this will be the next highest score. (A little message "No more scores!" appears in the top right when the end of the pairs is reached.) |
Below this pair the matching score is shown and (on the right), various buttons for dealing with a matched pair:
Keyboard shortcut keys for Flag/Recall Left and Flag/Recall Right are Ctrl+left arrow and Ctrl+right arrow |
To flag a record in Pairs format, select one of these buttons to flag the record on the left or right-hand side respectively. The button then changes: to recall a record, select the appropriate button with a green tick. |
False Match is Ctrl+Delete |
If the pair shown is not a true match, you can select this button to remove the match – meaning that this pair will no longer be regarded as a match. |
Typically, you either flag false matches in a score band if most of them are true matches, or flag the odd true match if most of them are false. You do not need to both flag false matches and flag true matches within a score band, as global deletion will flag all matches above a match score as long as they have not been declared false. If you are using the Matrix Report or Group Matches feature, any matches you flag interactively will not be included. To allow this, you can increase the match score of any pair in the grey area, so that you move them into the area that is being automatically flagged or grouped.
|
This button allows you to enter the name or names of any additional fields that you want to see in the Verify window. Just scroll down and type the name of the field at the end of the list. You have to know the exact spelling of the field name or it will not be displayed, but you don't have to specify the field type or width. You can also change the order in which the fields are displayed, by dragging the square button to the left of the name. Then select OK and say 'Yes' to the question "Make structure changes permanent". NB: this question refers to the structure of a temporary work file, not the structure of the Main File. |
|
This button asks mDesktop to remember the current pair so you can return to this pair later if you are deleting a lot of matches interactively. |
|
This button will return you to the marked pair. |
Click "Done" when you have finished. If you have not marked your position in the file, mDesktop will ask if you wish to. This means you can return to the same point at a later date.
Verify Matches in Sets
This screen will display potential duplicates, in sets. The different colored highlighting is used to help distinguish where the differing fields are in the records. Fields in red are different, yellow shows information which is contained in the same field in the other record and white displays those fields which are identical.
The tick boxes between matching records (in the top right part of the screen) control what information is transferred to the Master record.
|
specifies what fields to transfer to the Master record. |
|
transfers selected fields to the Master record. |
|
goes forward to the next page of matching sets. |
|
expands all matching sets shown in the current page of matching sets. |
Below the matching pair display there are various buttons for dealing with that particular matched pair:
|
to flag the duplicate record in the pairs view section, select this button. The button then changes: to "Restore Record" which allows you to unflag the record. |
False Match is Ctrl+Delete
|
if the pair shown is not a true match, you can select this button to remove the match – meaning that this pair will no longer be regarded as a match. The button then changes: to "Restore Match" which allows you to reactivate the displayed matching pair. |
|
|
uses the Intelligent Data Merge (see Intelligent Data Merge) settings specified by the user to create a meta-record of the two records shown in the pairs section. |
Typically, you either flag false matches in a score band if most of them are true matches, or flag the odd true match if most of them are false. You do not need to both flag false matches and flag true matches within a score band, as a global deletion will flag all matches above a match score as long as they have not been declared false. If you are using the Matrix Report or Group Matches feature, any matches you flag interactively will not be included. To allow this, you can increase the match score of any pair in the grey area, so that you move them into the area that is being automatically flagged or grouped.
|
quickly find records that contain a piece of information as specified by you. The records that contain the information will then be displayed in the Verify Matches in Sets window. |
|
customizes the "Verify Matches in Sets" window by allowing the user to apply filters and sort orders so that only records of specified concern are displayed. |
|
allows you to enter the name or names of any additional fields that you want to see in the "Matches in Group" section. Just scroll down and type the name of the field at the end of the list. You have to know the exact spelling of the field name or it will not be displayed, but you don't have to specify the field type or width. You can also change the order in which the fields are displayed, by dragging the square button to the left of the name. Then select OK and say "Yes" to the question "Make structure changes permanent". NB: this question refers to the structure of a temporary work file, not the structure of the Main File. |
|
|
jumps to the first matching set of a specified match score. |
Click "Done" when you have finished.