The Intelligent Data Merge allows a user to construct a list of simple rules for the merging of matched records. These rules can then be applied to the merging of fields as part of other operations performed on the matching records. The idea behind the process is that it allows the user to extract the most important information from each pair of fields and combine that information into one field of superior quality. These enhanced fields are then inserted into the appropriate record (i.e. the destination record or record to be kept), thus improving the overall quality of the record.
Intelligent Data Merge configuration should be reviewed (via the Jobs/Setup menu, Matching Setup) if you change matching parameters, to avoid incorrect merging of data. If in doubt, you should not use this option. Thorough checking of results is advisable when using Intelligent Data Merge, especially on any new type of data.
The process compliments the functionality of the Transfer Data, Flag Matches and Verify Matches operations by giving the user more control over what data should be retained.
Transfer Data
The Transfer Data operation (from the Matching or Merge/Purge menus) allows a user to transfer data from specified fields in one record to those fields in the matched record. The IDM process can override the options for the Transfer Data operation. To use IDM in the Transfer Data option, simply check the appropriate box on the main Transfer Data option screen.
Flag Matches
The deletion process employs a Deletion Priorities table to select which record should be flagged. The IDM process will not override this, rather complement it by running after the choice of which record to flag is made, thus retaining valuable information in the flagged record and merging it with the kept record. To use IDM when deleting matches simply check the appropriate box on the screen that appears at the beginning of the deletion process.
Verify Matches
Within the Verify Matches operation, the IDM process operates as a tool which the user can control interactively.
Selecting this option will produce a "meta" record showing the result of the process on the selected fields. The process should take no more than a second or two and then the window on the right will appear. The window shows the results of the intelligent merge for the two records. Only the fields which exist in both records and have been specified in the IDM Options set-up will appear. Clicking either of the 'replace' buttons will insert these fields into the corresponding record and additional changes can then be made. Selecting Discard will return to the Verify view without making changes.
The main options screen for the IDM process can be found under the menu Setup > Matching Setup as Intelligent Merge Options.
The IDM options screen requires that a Main File be selected before setting up any options. It allows the user to select which fields should be involved in the IDM process and what options should be applied to those fields.
Available Fields
This window will only display fields that exist in the currently selected Main File(s). Certain fields are omitted from this list because they are internal matchIT generated fields or are unsuitable for inclusion in the process. To transfer a field from this list to the list of selected fields either click on the green check mark or double click the fieldname itself.
Fields Selected
The options screen allows the user to compile a list of fields for inclusion in the IDM process. To remove a fieldname from this list click on the red cross or alternatively, to remove all the fieldnames from the list, click on the 'Clear' button. This list can be saved to a separate file for the purpose of allowing alternative setups for different jobs. To do this click on the 'Save As…' button. A dialog box will then appear asking for a filename. The saved file will have an extension of '.idm' to identify it as an Intelligent Data Merge configuration file. To load a configuration file click on the 'Load…' button. A dialog box will appear again asking for a filename. Only files with an '.idm' extension will appear. Select an appropriate file and the list of selected fields will be replaced with the contents of this file.
Default Priority
This option controls the standard merging of all fields. Clicking once on a fieldname in the list of selected fields will make these options available. If an option does not become active then it means that the particular option is not available for the selected field. Additionally, the priorities available for selection may also differ depending on the selected field:
Keep Longest/Keep Shortest – These options allow the user to keep the field which is longer or shorter respectively.
Keep Highest/Keep Lowest – These options allow the user to keep the highest or lowest value field. If both fields are numeric, then simply the highest or lowest value will be kept. If the fields are not both numeric then a string comparison will be done and the higher or lower string will be kept.
Keep Uppercased/Keep Lowercased/Keep Propercased – These options analyze each word in the fields and will merge fields together depending on the case of each word. For instance, the process will be able to pick out only the uppercased words in both fields and combine them into one field.
Keep Value – This option allows the user to enter a preferred value – if one of the fields being compared has this value, that value will be kept.
Keep for Other Value – This option allows the user to enter a value or condition referring to the contents of another field. The data for the field selected will be kept from that record containing the “other field” which satisfies the condition. The contents of the Value entered for this option must be one of the following:
- highest(field_name)
- lowest(field_name)
- longest(field_name)
- shortest(field_name)
- field_name = value
As this feature is not expected to be used by inexperienced users and is expected to be a standard setting which is rarely changed, there are no checks for the correctness of the syntax entered here.
You shouldn't use the other value in the "Keep for Other Value" as a value to merge in its own right, as that will compromise the check for the value in Keep Other Value.
Keep if Length – This option allows the user to enter a preferred field length – if one of the fields being compared has this length, the value from that record will be kept.
Casing
The IDM process also allows the user to case the resultant field. Again, this option may not be available for certain fields. If the box is not active after selecting a field from the list then casing is not available for that field:
Uppercase – All words in field will be uppercased.
Lowercase – All words in field will be lowercased.
Proper case – All words will be cased according to matchIT's intelligent proper casing algorithm. Although this is an enhanced casing procedure which can detect and cope with many casing conventions, complicated casing structures may cause inaccurate results.
Name Completion
This option will construct the best possible name from two 'Addressee' fields – this option should be used with caution and careful Quality Assurance, because there is potential to end up with e.g. Mr Bob Robert Campbell. It will attempt to map initials to full names, retain prefix and suffix information, cancel out phonetic duplicates and allow for shortened names i.e. 'Bob' and 'Robert'. As an example, 'Mr Bob James Campbell' and 'Robert J Campbell' would ideally result in a merged field containing 'Mr Robert James Campbell'. This is assuming that the default priority is set to 'Keep Longest'. The default priority still influences certain choices. In this example, should it have been set to 'Keep Shortest' then the result would actually be 'Mr Bob James Campbell'. Additionally, Name Completion allows for the retention of the most significant prefix. Should we adjust our example fields to contain 'Mr Bob James Campbell' and 'Dr Robert J Campbell', the resultant field would contain 'Dr Robert James Campbell'.
Assume surname exists – This option can significantly improve the performance of Name Completion by immediately choosing between the last elements in each field excluding suffixes. This means that misspelled or similar sounding surnames would be mapped to each other and one chosen over the other. The default priority again chooses which. If we adjust the example fields again to contain 'Dr Bob Campbell' and 'Robert James Campbell' then the result, assuming default priority is set to 'Keep Longest', would be 'Dr Robert James Campbell'. However, this option may not be deemed appropriate for certain data where a surname is not guaranteed to appear in every 'Addressee' field. As this option would produce poor results in such circumstances, it can be switched off by unchecking this option.
Name Completion will only be used if the appropriate box is ticked on the options screen. Should this not be ticked, the 'Addressee' field will be dealt with as a standard field and merged solely depending on the default priority. Name Completion will not be used should Family or Household Matching have been selected, as it is not appropriate.
Certain fields, such as Address and Telephone, are dealt with in a different way to others in the merging process. The overall way in which these fields are treated cannot be altered.
Address Lines
Address Lines are not dealt with separately but as a single Address entity. As such, the process will currently retain the address which contains the greatest number of address lines. It will ignore any address lines with less than three characters when calculating this value. When it then inserts the resultant address back into the appropriate record, the user will be informed should there not be sufficient address line fields and can then decide to cancel address line merging or allow truncation of address lines.
As the criteria for this merging operation is predetermined the user is unable to select a default priority but casing of the resultant address is still allowed.
Telephone/Fax Fields
When merging telephone or fax fields it is important to take into account what information is important. When the process is applied to either of these fields it first removes superfluous data, such as parentheses, before applying the default priority. After choosing between the two fields it then reverts to using the original field containing parentheses, etc. This ensures that the superfluous data does not cloud the comparison of the two fields.