This article summarizes data extraction methods used in the end-to-end integration solution. For a more detailed explanation, refer to the Online Help.
There is appreciable effort required in configuring, testing and tuning data extraction methods to achieve optimal performance. Therefore, there are few hard-and-fast rules as to when a particular method should be used. The selection of a method is often determined on a table-by-table basis and may change over time. This article identifies some of the variables to be considered for improving the overall performance of data extraction and replication, the pros and cons of each extraction method, and examples of their usage. Please note that while this article has an SAP focus, many of the techniques are applicable in non-SAP environments.
System Considerations
Before extracting data, review the Software Requirements and Hardware Sizing Guide.
Considerations include:
- Network hardware and bandwidth (LAN and WAN)
- Proximity to the data center (including source and target instances)
- Number of clients and usage patterns
- System configuration
- Database design
Data Considerations
With a system in place, there are several scope and data-related variables to consider:
- The type and quantity of data – Are there clustered or pooled tables in scope for extraction? How many records are in the table? What is the table width?
- Frequency of change – Is the data relatively static (e.g., a system configuration table) or changing rapidly (e.g., open items)?
- Rate of refresh – How close to real time does the data need to be to achieve the project’s goals?
- Security permissions and available connection types – Permissions must be granted to connect to and extract data from other systems. The level of access that can be granted to these systems affects the design of a solution and the methods for data extraction.
Connection Types
The methods for extraction considered in this section rely on two connection types:
- Open Database Connectivity (ODBC) – The most common and universally used connection type and an industry standard, ODBC can connect to any database management system for which an ODBC driver is installed or available.
- Remote Function Call (RFC) – The protocol used to call functions within SAP and the standard interface for communication between SAP Systems.
Extraction Methods & Tools
The following are methods and tools to use for data extraction:
- SAP RFC & BOA RFC – Allows for the extraction of SAP pooled and clustered tables.
- Assemble – A tool within DSP that creates and executes packages to transfer data between systems. The tool uses an ODBC connection with a nonindividual-specific account with read-only access.
- SSIS (SQL Server Integration Services) – A component of MS SQL Server, SSIS replaces Data Transformation Services (DTS). SSIS is typically faster than Assemble because a single program is both reading the source and loading the target.
- DBMoto (within Collect) – Contains functionality that uses an ODBC connection to download data in any of three different ways: Refresh, Change Data Capture (or Mirroring) and Synchronization.
The following table summarizes the pros, cons and usage of each extraction method:
Connection Type |
Extraction Method |
Pros |
Cons |
Examples |
RFC |
|
|
|
Material Long Text |
ODBC |
Assemble |
|
|
|
SSIS |
|
|
Largest master data or transactional tables where Assemble extraction is not sufficient. Historical examples include:
|
|
DBMoto (Collect) |
|
If trigger-driven (no native change logs), changes to the source system are required (triggers and trigger tables for the tables that are going to be replicated with DBMoto) |
Largest master data or transactional tables where Assemble extraction is not sufficient. Historical examples include:
|
Improve Data Extraction Efficiency
Use the following tips to improve efficiency when using one of the outlined extraction methods:
- Dedicated hardware – The fewer applications and users sharing the same hardware, the better and more consistent the performance.
- High speed connectivity – Bandwidth is often a bottleneck.
- Extract only relevant tables – Clearly define the tables that are in scope for extraction.
- Extract only relevant records/and or columns from tables – Excluding fields that are not being used effectively makes a narrower table. In some cases, the client may request that certain sensitive fields be excluded from extraction altogether. Adding relevancy (e.g., pulling only active customers) also reduces the amount of data being extracted.
- Create a development environment with comparable performance to production environment – The testing and configuration process is critical. Testing in an environment that closely mimics that of development provides a much more accurate picture of performance and behavior.
- Baseline and monitor extraction performance – Variables that impact efficiency change over time and it is important to monitor performance. Such variables include record volumes, changes in hardware or software, changes in business processes or protocols, and load on the network.
Decision Tree
Refer to the following decision tree to determine which extraction method is best to use based on the type of data being extracted: