To copy a DynamoDB table to Couchbase using Python more efficiently, even with over 10 million records, I have attempted using ThreadPool and batch-wise insertion. However, this approach is time-consuming and often results in ambiguous errors during the process. Are there any libraries available for faster document insertion?
Hi Anjali -
If the DynamoDB table can be dumped as json, then maybe the cbimport utility?
The Java SDK does inserts in about 70% of the time of the Python SDK.
Using a longer timeout or less concurrency will avoid the timeout exceptions. When doing only inserts, the ambiguous timeout exceptions can be considered to be unambiguous (i.e. definitely did not insert) - IF on a subsequent run of the failed documents, DocumentExists exceptions are caught and ignored.
If this is a one-time load operation, and there are indexes in couchbase - it could be beneficial to delay creating those indexes until after the data has been inserted (to avoid indexing occurring at the same time as the inserts).
- MIke
Thanks for your reply, Mreich. I am able to avoid duplicate documents and resolve time ambiguities. I tried to insert data batchwise using a single thread pool with a maximum number of workers, but it is taking a lot of time to migrate one DynamoDB table to Couchbase. Are there any libraries in Python for faster insertion?
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.