I have a requirement to load buckets data into multiple JSON files with 100 documents in each file.
I have tried cbtransfer but it copies the into csv file as a whole server specific, which contains duplicate records.
can someone suggest how I can achieve this.
dh
2
You could, if you have a Query node, use the REST API with an ordered SELECT * query with OFFSET and LIMIT clauses to segment your data.
If you are on Linux/MacOS and you have a tool such as ājqā installed to trim additional output, a script like this would achieve your aim:
$ cat t.sh
#!/usr/bin/bash
S="SELECT count(1) c FROM \`travel-sample\` t"
C=`curl -su Administrator:password -d "metrics=false&statement=${S}" http://localhost:8093/query/service| jq .results[0].c`
for ((i=0;i<$C;i+=100))
do
curl -su Administrator:password -d "metrics=false&statement=SELECT t.* FROM \`travel-sample\` t ORDER BY meta().id OFFSET $i LIMIT 100" http://localhost:8093/query/service| jq .results > export_$((i/100)).out
done
(Iām not saying this is the best way, just a way to achieve your aim - each export file containing an anonymous array of documents.)
HTH.
You could try the following two commands
cbexport json -c couchbase://127.0.0.1 -u $CB_USERNAME -p $CB_PASSWORD -f lines -b source_bucket -o all_output.json
split -d -a 10 -l 100 all_output.json
I think you can avoid the intermediate file all_output.json you can do the following if your system supports /dev/stdout and use a pipeline
cbexport json -c couchbase://127.0.0.1 -u $CB_USERNAME -p $CB_PASSWORD -f lines -b source_bucket -o /dev/stdout | egrep -v '(^$)' | split -d -a 10 -l 100
You can also speed things up if you have lots of CPU cores by adding -t 16 to the cbexport command.