Collect Extraction Performance Optimization Techniques

Answered

Venkata Ravi Kumar Mullamuri

September 02, 2021 19:21

Hi All,

We have very huge tables (800 million, 600 million records etc) which we have to extract from source for data migration.

Can you please let me know in case if we have any performance optimizations can be done in collect to improve extraction performance?

Thanks & Regards

Ravi

Comments

11 comments

Best answer
PSA Support

September 03, 2021 11:57
Also, when working with large tables like this, you may want to look at using Syniti Data Replicator (previously called DBMoto). It's a change data capture tool so you would have one (long) initial pull but after that it would only pull the changes records and would be a near real time copy of the data.
0
Joseph Flesche

September 02, 2021 19:48
When I hear about large tables like this, I always ask the question, do you need all of those records? (And all of the fields?)

If not, then reducing the number of records extracted and the columns will greatly help.

I do this with large tables (for us anyway) such as MSEG, AUSP, JEST, AUFK, ILOA, etc. Saves a lot of time.
0
Venkata Ravi Kumar Mullamuri

September 03, 2021 06:10
Hi Joseph,

Thanks for the update.

We need all the fields actually and yes i tried restricting the data using some of the filters, it still taking 14 to 15 hrs for extraction. Looking at the cutover timelines having 15 hrs for a table extraction is bit challenging.

So, wanted to check if there is any other techniques we have to improve extraction performance.

Thanks & Regards

Ravi
0
Andrew Lund

September 03, 2021 06:22
Ravi - what version of ADM are you running? In the latest version, there is the option to break down an extract into multiple extracts each with different filter criteria. I also assume you are extracting via RFC rather than ODBC.
0
Venkata Ravi Kumar Mullamuri

September 03, 2021 06:32
Hi Andrew,

We are on 7.4.4.40 and i am extracting using cranport/manual cranport (ODBC/SQL) not through RFC as these are transparent tables.

Can you please let me know how to breakdown the extraction and also does this help when we extract multiple tables at once?
0
Andrew Lund

September 03, 2021 07:24
Hello Ravi

In Collect you should be able to add the same table to extract multiple times, each with a different Table Rename and each with a different extract where clause. For example, if your table includes the Company Code column, you could profile the results and distribute the company codes across multiple extracts.

Does this help?
0
Venkata Ravi Kumar Mullamuri

September 03, 2021 12:20
@... - Thanks for your inputs. I tried this but this is not feasible in my scenario as i have to extract 7 large tables at same time.

PSA Support - Thanks for your inputs. Does this replication is quick when dealing tables with millions of records, any idea as per your experience?
0
Eric Weldon

September 03, 2021 12:33
Yes the replication happens in seconds (it pulls only the changed records every 5 minutes). The initial "large" pull would happen at some off hours time.
0
Eric Weldon

September 03, 2021 12:33
Also, Collect allows you to pull multiple tables at the same time
0
Tyler Warden

September 03, 2021 12:35
You can set the replication in the change data capture engine as short as ~20 seconds so if you needed changes replicated in near-real-time you can easily get sub-minute after the initial pull
0
Kurt Vandergriend

September 03, 2021 14:06
Have you limited the columns being pulled to only those columns that are in scope? This can dramatically improve performance. Simple change the SELECT statement in the CranPort package.
0

Please sign in to leave a comment.

Comments

Didn't find what you were looking for?