Multi–Level Matching is performed using a job script (see Job Scripts for more information), which incorporates a number of bespoke Visual FoxPro programs, as well as customized settings and modified match keys.
The script is supplied as a template, and by default is set up to use 'Example1.TXT' and a corresponding DBF file. When the script needs to be used for processing live data, the user will need to modify the script in such a way that it points to the source data and corresponding DBF file(s). Once the user has pointed to their files, the script will be ready to run.
The job script will perform an Import and dedupe, it will then modify the structure of the working DBF and will perform the deletion step a total of three times. Upon finishing, the working DBF will contain separate fields displaying individual level, business level and address level matches. These field will enable the user to see which records match which, at what level, and how many other records any one record may match.
When the script has finished, the user can either export the data and matching results manually, using mDesktop's standard Output menu, or can set up another script specifically for Output. This output script can be set up to run as soon as the Multi–Level Matching script finishes.
Each row displayed in the Job Editor, when browsing the Multi–Level Matching job script, is listed and described below. The Job Editor dialog can be accessed by selecting the job script from within the 'Job Maintenance' dialog, then clicking on 'Modify'.
Most rows in the script feature a program, and as these programs have been designed not to require any further modification from the user, the descriptions below merely give an overview of what each program does. For further information, please contact 360Science.
- Row 1 ("1 RESTORE SETTINGS.PRG")
Restores all parameters, weights and matching matrices required to find matches on all levels: NAME, COMPANY and ADDRESS. The parameters will force mDesktop to generate name matching keys using individual names. Matching does not take place at this stage.
- Row 2
Imports the source data file into a pre–defined DBF structure and generates match keys. At this step, the options (accessed via the 'Options' cell) must specify the correct input format.
- Row 3 ("2 RENAME NAME FIELDS.PRG")
Renames the name matching keys (generated during the previous step using individual names) from NAME to IND_NAME, NAME1 to IND_NAME1 and from NAME2 to IND_NAME2. New blank Name fields are generated. The parameter that specifies which type of names are used to populate the name key fields are changed from 'Individual' to 'Business' in preparation for the next step.
- Row 4
Regenerates name matching keys following the parameter change in the previous step. Keys are generated using business names.
- Row 5 ("3 ADD IND WEIGHT.PRG")
Modifies the weights table, adding a weight on individual name as well as business, address and zip code.
- Row 6
Finds matches using specific match keys, which incorporate phonetic company name, phonetic individual name, postcode elements and phonetic address keys.
- Row 7 ("4 COPY MATCHING RESULTS.PRG")
Finds matches at address level using an additional program, which re–evaluates address–matching using different matching options from the previous matching run.
With all levels of matching now performed, the program in turn deletes and recalls matches at each level, each time populating the MATCH_REF and SET_DUPS fields with matching information that corresponds to that level. After each Delete step, the MATCH_REF and SET_DUPS fields are renamed accordingly; after deleting individual level matches, these fields are renamed 'INDIV_REF' and 'INDIV_DUPS', after business level deletion; 'COY_REF' and 'COY_DUPS' and after address level deletion; 'ADDR_REF' and 'ADDR_DUPS'.
On completion of this program, the Multi–Level Matching script finishes. The processed table now contains references grouping matches together at the level they matched at.
Before Running the Multi–Level Matching Script
Before the Multi–Level Matching Script is run, it must be set up to point to the file that is to be processed. Row 2 of the script, which by default points to Example1.txt in the 'Source File' column, should instead point to the correct source data file.
All rows that point to Example1.dbf in the 'Main File' column, should point to the template DBF file corresponding to the source data file. This file will have the source data file imported into it and will then undergo processing, so it's structure must therefore match the source data file.
The easiest way to set up a template DBF file is to bring the source data file through the Single File Wizard. After mapping all fields as required, tick 'Create Match Keys' in the Processing Options dialog, and then click on 'More' and 'Save and Exit'.
If the Multi–Level Matching script is to be used more than once on a set of data, it is recommended that the script be cloned first. This way, after processing the data, the script will not need to be changed, and potentially changed back the next time that set of data needs processing. Scripts can be cloned by right clicking on the grid in the Job Editor dialog and choosing 'Clone Job'. The clone should be given a name that relates to the data that it will be processing.
Running the Multi–Level Matching Script
Once the script has been set up correctly, it can be initialized by clicking on 'Start Job' in the Job Maintenance dialog. The script should only ever be started from the beginning, as opposed to resuming it from a specific step using 'Resume Job', which will undoubtedly cause it to fail.
To run the script from the command line, please refer to 'Running a Job Script automatically when you start matchIT'.
When the script has finished running, the processed DBF file will contain new fields grouping the individual, business and address level matches together. These are structured in the same way as MATCH_REF and SET_DUPS.
If a record matches one or more other records at a specific level, the corresponding …DUPS field will display a figure representing the number of records that the record matched against. If it does not match any records at that level, the field will be empty. E.g. If record A matches records B, C and D at address level, it will contain a number 3 in its ADDR_DUPS field. If it only matches B and C at business level it will contain a number 2 in the COY_DUPS field. If it does not match any of these records at individual level, it will not contain anything in its INDIV_DUPS field.
The …REF fields indicate which records match which. In the above example, records A, B, C and D would have identical ADDR_REF values. A, B and C would have identical COY_REF values (record A would have its COY_REF field populated with its UNIQUE_REF value) and all of these records would have different INDIV_REF values (all identical to their UNIQUE_REF values).
The image below illustrates this example:
Further Processing
Upon completion of the Multi–Level Matching script, the processed data can either be exported manually using mDesktop’s standard Output functions or it can be exported using another job script, which will need to incorporate the program 'OUTPUT.PRG'.
To manually export the data when the script finishes; return to mDesktop’s main menu. Make sure that the processed table is open in mDesktop, then select Output>Output To File.
To automatically export the processed data using a job script, the additional job script will need to be set up before running the Multi–Level Matching script. The user will need to point to 'OUTPUT.PRG' from within the script and need to edit it to ensure that the correct output format, fields and destination(s) are specified. After setting the Output script up, open the Multi–Level Matching script in the Job Editor and tick the 'Run Another Job After This One' box, in the bottom right–hand corner of the screen. When prompted, select the Output script. Now, when the Multi–Level Matching script finishes the Output script will automatically run.
Adding A Custom Weight
If custom weights need to be added, the "3 ADD IND WEIGHT.PRG" program will need editing. The code required to add a custom weight is already used in the program to add a weight on Individual Name. Therefore, this existing code can be copied and pasted, and then edited to look at the field in which the weight is to be added.
An example of the code required, if a weight on Email were to be added is as follows:
SELECT WEIGHTS
GO TOP
REPLACE EMAIL WITH 60
SKIP 1
REPLACE EMAIL NAME WITH 40
SKIP 1
REPLACE EMAIL NAME WITH 25
SKIP 1
REPLACE EMAIL NAME WITH 15
SKIP 1
REPLACE EMAIL NAME WITH 25
SKIP 1
REPLACE EMAIL NAME WITH 0
SKIP 1
REPLACE EMAIL NAME WITH 1
SKIP 1
This code should be inserted between the following existing lines of code:
#ENDIF
DO UseTable WITH "PARAMS", "P", "", 1
Adjusting the Minimum Score to Report
To adjust the minimum score to report, from within the Job Editor screen double-click on the 'Options' cell of row 6 (where the 'Match Keys' cell is set to 'Change'), then when prompted; set the 'Setup Option' drop-down to 'Change' and click on 'Continue'. The main options for this row will now be displayed.
From within the 'Matching' tab of the options, set the 'Minimum Score to Report' setting to the desired score. Please note; this is set to 50 by default, to ensure that the Address level matches are found as well as the business and individual level matches.