Introduction
Starting with the DSP® 6.6.1 release, dspMonitor now includes:
- Data quality thresholds - The ability to specify different scoring thresholds which are then used to drive the status coloring of the reports, Green above the Warning threshold, Amber between the Warning and Critical thresholds and Red when the score is below the Critical threshold.
- Opportunity / defect modeling - Error reports can now be assigned an Opportunity view so that the total number of records can be compared to the failed record set and thus a quality score calculated against the defined thresholds.
- Improved reporting and dashboarding - New dashboards have been added providing increased visibility into the overall, group, object and application level data quality scores as well as the ability to drill down to the individual reports.
Previously in dspMonitor the solution was only able to process and visualise reports and calculate the number of records in the report. The reports could be named and grouped so as to relate to a specific data object or application but the categorisation of the reports was only possible via these limited mechanisms.
From the 6.6.1 release, you can now relate reports and their scores to applications in the enterprise system landscape using the assignment of reports to tables and columns extracted via the Collect application component. This article focuses on how this new functionality can be applied, so that data quality scoring can be assigned to the enterprise applications.
Aligning Collect Targets to Report Repositories
The assigning of Reports in a Report Repository to a Enterprise Application is performed by first assigning the respective Collect Target Data Source(s) to the specific Report Repository. More than one Collect Target can be related to a single Report Repository by ‘Including’ it in the configuration.
The following steps
- Navigate to dspMonitor - Configuration - Report Repositories.
- Click the Collect Targets icon on the respective Report Repository entry.
Screenshot 1: Configuration - Report Repositories - Include any Collect Targets that have been registered in the Data Source Registry as either an sdb or dg Data Source.
Note: You can include or remove Data Sources from the Report Repository, using the buttons in the toolbar for the highlighted Data Source.
Screenshot 2: Configuration - Report Repositories - Collect Targets
Once the Collect Target(s) have been Included you can return to the list of Reports registered in the Report Repository to assign tables populated in the Collect Target to the respective reports.
Assigning Target Tables to Reports
Collect is used to extract the enterprise application data and populate the various staging databases used for Migration, Quality or Governance (depending on the deployed DSP solutions). dspMonitor can assess the data quality of an application by defining a relationship between the reports and the specific staged application tables (and optionally the columns) to which the data quality score can be applied.
The following example describes how to accomplish data quality scoring for enterprise application data and the resulting data quality scores, dashboards and charts.
Example
The following example shows five registered reports, all based on various data quality checks being performed on the Salesforce Account object.
This example relies on the following steps have been performed:
- Extracted the data from the Account object in Salesforce into the dgSalesforce database using a combination of Collect and the IG Universal Connect component.
- Built out the data quality error views and the corresponding opportunity view.
- Aligned the Collect Target (dgSalesforce) to the Salesforce Report Repository.
To assign Target Tables to reports:
- Navigate to the relevant Report Repository (i.e. dgSalesforce).
- Click the Target Tables button for the specific report to associate to one or more Target Tables.
Screenshot 3: Configuration - Report Repositories - Repository Reports - Choose the Collect Target from the available list of included Collect Targets.
- Specify the Table Name by entering the first few letters and selecting from the filtered list. It is assumed that Collect has been used to populate the staging database being assessed / related to the specified report. If another technology has been used to populate the staging database (e.g. an ETL tool or iPaaS tool) then the database and tables will need to be registered in Collect for the association and subsequent scoring to be possible. For reports that contain multiple tables please refer to the section Cross Table Reports.
Screenshot 4: Configuration - Report Repositories - Repository Reports - Target Tables - If you want to associate (and score) a specific column within the Target Table, click the Columns button.
- In the resulting dependent pane, start to type the first few letters of the column name in the Column Name combo box and select the relevant entry.
Screenshot 5: Configuration - Report Repositories - Repository Reports - Target Tables - Table Columns
Note: The column(s) selected should be the columns being evaluated, not the key of the table. - Repeat steps 4 through 6 if additional tables and columns are to be scored against the specific report.
- Repeat steps 2 through 6 for the other reports to score against their relevant table and columns based on the report content.
After performing the above steps the user can either, wait for the next scheduled processing of the reports or manually process the report group to evaluate the data quality scores and update the metrics associated to the application, tables and columns.
The following screenshot shows the Application (Collect Target) and its respective data quality score metrics. The Your Application Tables page lists the tables with the scoring specific to the individual table calculated from the associated reports. You can then drill down into the specified column scores or view the associated reports at the table level.
Screenshot 6: Your Applications
Click the Columns button to display each of the registered columns showing the data quality scoring metrics and, on the dependent page, the associated reports.
Screenshot 7: You Applications - Your Applications Table Columns
Cross Table Reports
Where an Error Report (view) is pulling data from multiple tables it could lead to confusion as to which tables / columns should be associated with the report. An example is included below to provide clarity on how the tables and columns should be associated to the report.
Example:
A business rule has been defined that states an Account is not fit for purpose if it has no registered Contacts against it i.e. nobody to contact at the account.
The view would be written to pull all active accounts and join (using a left outer join) to the contacts table, joining on Account.ID = Contact.AccountID
In this example the SQL for the Error Report View could be written as below:
SELECT dbo.Account.Id,
dbo.Account.Name,
COUNT(dbo.Contact.Name) AS NoOfContacts
FROM dbo.Account LEFT OUTER JOIN
dbo.Contact ON dbo.Account.Id = dbo.Contact.AccountId
GROUP BY dbo.Account.Id, dbo.Account.Name
HAVING (COUNT(dbo.Contact.Name) = 0)
The Opportunity View would be to select the relevant fields from the Account table and could be as simple as:
SELECT * from dbo.Account
In this example, you would associate the Error Report to the table ‘Account’ and to the column ‘Account.ID’ as the issue is with the Account missing a Contact (i.e. a completeness check on the Account table).
The best practice is to align the multi-table report with the Collect Table/Fields that most closely match the record that is in error. In this case, it is the Account that is considered invalid because of data missing in another table (Contact). Since it is the Account itself that is not fit for purpose, we align the report with the Account table.
Dashboards and Charts
After modifying your existing reports or registering new reports to include the opportunity views, and aligning them with the application (tables and columns) and having processed your reports, the new dashboard will be populated.
The Dashboard initially shows the summarized data quality scores for all processed reports (Overall) and by Quality Dimension in the top panel with the lower panel showing the score, no of reports, no of tables, no of columns, no of errors and a link to the history chart that make up the score above.
By hovering the pointer over the bottom part of the chart (i.e. the word OVERALL or one of the Dimensions (COMPLETENESS, CONSISTENCY, etc) the user is presented with the option to drill down on the dashboard based on the Object, Groups or Applications.
Screenshot 8: Monitor Dashboard - Overall focus
Selecting the Applications option will display all the applications associated to the Overall list of reports, or the reports related to the specific Data Quality Dimension. The user has the option to refocus the dashboard by again hovering over the Application name on the newly displayed chart line which will then limit the dashboard to just the selected Application reports.
Screenshot 9: Monitor Dashboard - Applications focus
By clicking on the Tables or Columns tiles in the bottom row of the dashboard the user will be taken to the detailed report screen for the specific Application showing all of the associated reports, as shown below.
Screenshot 10: Your Application Table Reports (filtered by specific Application)