Component Overview
Stored Procedures
These provide the high-level functionality; they’re responsible for such things as accessing the database, finding records to compare, and importing results.
These provide the same level of high-level functionality as the stored procedures but are delivered through Microsoft’s SSIS Software. SSIS packages provide a rapid way of implementing your matching process and can be scheduled from SQL Management Studio’s SQL Server Agent.
Worker Processes
These provide a safe interface to the mAPI, an unmanaged COM component for generating search keys and comparing records. Should a problem be encountered within the unmanaged code, the service can automatically spawn a new worker and thus provide a highly stable and robust batch processing environment.
From a user perspective you will not need to worry about the workers, but we mention them here for the purposes of completeness.
mSQL Service
This provides a link between the stored procedures and worker processes. When a stored procedure runs, it connects to the service which, in turn, creates a new worker. The stored procedure then connects to the worker, which will then provide the stored procedure with all mAPI-related functionality.
From a user perspective you will not need to worry about the service, but we mention it here for the purposes of completeness.
Error Logging
When an error is generated through the mSQL stored procedures or SSIS tasks, an error log will normally be created in the specified temporary folder (note that where there are actual problems with the XML configuration file itself, i.e. invalid XML then this error log cannot be produced). Normally if an error occurs, you should be able to review this error log and quickly see what the problem is. In the rare scenario that you are unable to determine the cause of the error yourself, then feel free to contact our support team and attach the error log file to a ticket submission above, we will then be able to actively investigate the problem.
mSQL Monitor
This isn’t an essential component of the mQL system. It’s a simple program that’s useful for monitoring the status of the service and any active worker processes. It can be used to detect deadlocked and crashed workers and provides a convenient mechanism for terminating them so that the stored procedure can resume processing.
SSIS Tutorial
Below is a list of articles that focus on setting up a process using our SSIS tasks, anything you can do in SSIS you can also do through our stored procedures. So if you find SSIS too limiting because you can't set variables, we'd still suggest setting up your initial process in SSIS then see the last article below to read about taking the SSIS process and converting it to stored procedures.
Installation:
- Installing mSQL (With Screenshots)
- Adding the SSIS Components to the Toolbox (SQL 2005 and 2008 only)
- Explanation of the SSIS Tasks
Address Validation (for those with the module):
- mSQL 2.2.1 - SSIS Tutorial - Setting up US Address Validation
- mSQL 2.2.2 - SSIS Tutorial - Addressing - understanding the outputs
- mSQL 2.2.3 - SSIS Tutorial - Addressing - understanding the result table
Basics:
- mSQL 2.3.1 - SSIS Tutorial - Setting up Generate Keys - Setting a template
- mSQL 2.3.2 - SSIS Tutorial - Setting up Key Generation
- mSQL 2.4.1 - SSIS Tutorial - setting up a basic matching process
- mSQL 2.4.2 - SSIS Tutorial - incorporating exact matching
- mSQL 2.4.3 - SSIS Tutorial - Understanding Output Tables From A Matching Process
- mSQL 2.4.4 - SSIS Tutorial - findoverlap vs findmatches
- mSQL 2.5.1 - SSIS Tutorial - setting up a findoverlap
- mSQL 2.5.2 - SSIS Tutorial - Incorporating Exact Overlap
- mSQL 2.5.3 - SSIS Tutorial - mSQL Overlap Output Tables
- mSQL 2.5.4 - SSIS Tutorial - Find Matches/findoverlap more in depth, weights and constraints
Advanced
- mSQL 2.6.1 - SSIS Tutorial - Understanding Matching - Why did this score 85 on Individual Level?
- mSQL 2.6.2 - SSIS Tutorial - Understanding Matching - Why did this score 85 on Business Level
- mSQL 2.6.3 - SSIS Tutorial - Tight Business Matching
- mSQL 2.7.1 - SSIS Tutorial - What are all these key fields?
- mSQL 2.7.2 - SSIS Tutorial - Ouptut Tables, Filtering and Quality Scoring
- mSQL 2.8.1 - SSIS Tutorial - moving a matching process from SSIS to Stored Procedures
SSIS Tasks
The SSIS Tasks are installed with mSQL by default. During installation mSQL detects which version(s) of SQL server you have installed, and, providing SQL Server Integration Services are installed, registers the appropriate assemblies to the Global Assembly Cache, as well as copying them to the DTS folder of the installation. Note that all of the following SSIS tasks use the same template file as their base for creating XML configuration files for their processes, normally the US one located in the 'Config files --> SSIS --> Templates' directory. This template contains the paths to the matching matrices and master priorities files that are used in the processes, which can be manually amended if needed to point to different files. As well as the Tasks, a demo SSIS package is installed to the demo folder in mSQL Config Files directory, under an SSIS subfolder. There are versions for SQL 2005, SQL 2008, SQL 2012 and SQL 2014, all of which are configured to work with the demo database provided with mSQL. Once the demo database is installed and the setup step below for the tasks has been completed, the packages should be able to run out of the box – the only things that may need amending are references to connection strings, which can be done quite easily in BIDS/SSDT. If the connection strings do need to be amended, you may also need to check that the mappings in the Generate Keys components are persisted.
Setup
Before the tasks can be used in BIDS or Visual Studio, they need to be added to the toolbox in the control flow pane (Note this is not applicable to BIDS/VS 2010+). You can do this by right-clicking on the toolbox and selecting ‘Choose Items…’ The ‘Choose Toolbox Items’ dialog will appear after a while, in which you need to select the ‘SSIS Control Flow Items’ tab. In this tab you should see the mSQL tasks currently unselected (they are identifiable by the fact that they are named using the prefix ‘MISQL.’). Simply select the tasks by checking the checkbox next to each one, and click ‘OK’.
Usage
Below is a brief description of each task and what it does. For a visual representation of some typical SSIS processes and some of the tasks themselves, please see The SSIS Tutorial for mSQL.
MISQL.GenerateCorrectedAddresses – This task should be the first task in a sequence if it is to be used at all. Its function is to generate corrected addresses from a given source and output them in a specified output format. This task basically wraps the same core functionality as the procedure msp_GenerateCorrectedAddresses. The specified output table in this task is persisted in a variable at the SSIS package level at design time so that it can be picked up and used in the next task. Note that this is currently only available to licensed users only.
MISQL.GenerateNCOAAddresses – This task will usually follow the GenerateCorrectedAddresses task, but can also be run standalone or even following the GenerateKeys task to allow for maximum flexibility. Its purpose is to identify individuals, families, and companies that have changed their address (i.e. moved), to help keep a database up to date. Please see this article for further information on the msp_GenerateNCOAAddresses stored procedure, which provides the same functionality as this task. Note that this service is currently only available for US addresses, to licensed users only.
MISQL.GenerateKeys – This task should be the first task in a sequence unless any of the previous tasks are also being used. This task generates the match keys required to be used in the matching tasks from the specified source data. Mappings are made from source columns to their relevant mAPI field. This task encapsulates the same core functionality as msp_BulkGenerateKeys. By setting up a GenerateKeys task, you are effectively setting up a ‘Data Source’ with an ID, which is used as a reference in following matching and grouping tasks.
MISQL.FindMatches – This task should follow on from a GenerateKeys task. It is used to set up and execute fuzzy matching based on specified match keys and minimum score thresholds, and is pointed at a data source set up by a GenerateKeys task. The core functionality used by this task is the same as the procedure msp_FindMatches.
MISQL.GroupMatches – This task should follow on from a FindMatches task. It is used to group the results produced by a FindMatches task. Different types of tables can be selected to be output and the names of the tables can be specified. The core functionality of this task encapsulates the procedures msp_GroupMatches, msp_OutputMatchingPairs, msp_OutputMatchingGroups, msp_OutputDuplicates and msp_OutputDedupedTable.
MISQL.FindExactMatches – This task should again follow on from a GenerateKeys task. It's basically the same as the FindMatches task, only instead of fuzzy matching, exact matching is applied using the keys specified. The core functionality used is the same as msp_FindExactMatches.
MISQL.GroupExactMatches – This task should follow on from a FindExactMatches task. It is used to group the results produced by a FindExactMatches task. The core functionality is the same as the procedure msp_GroupExactMatches.
MISQL.FindOverlap – This task should follow on from two GenerateKeys tasks. It is used to set up and execute fuzzy overlapping between the two data sources set up in the preceding GenerateKeys tasks. The match keys and score thresholds to be used can be configured. The core functionality used by this task is the same as the procedure msp_FindOverlap.
MISQL.GroupOverlap – This task should follow on from a FindOverlap task. It is used to group the results produced by a FindOverlap task. Different types of tables can be selected to be output and the names of the tables can be specified. The core functionality of this task encapsulates the procedures msp_GroupOverlap, msp_OutputOverlapMatchingPairs, msp_OutputOverlapMatchingGroups, msp_OutputOverlapDuplicates and msp_OutputOverlapDedupedTable.
MISQL.FindExactOverlap – This task is basically the same as the FindOverlap task, only instead of fuzzy matching, exact matching is applied using the keys specified. The core functionality used is the same as msp_FindExactOverlap.
MISQL.GroupExactOverlap – This task should follow on from a FindExactOverlap task. It is used to group the results produced by a FindExactOverlap task. The core functionality is the same as the procedure msp_GroupExactOverlap.