I have tried to write some code but withouth any documentation I have not been able to achieve my goal. In particular I have not been able to build the CouchbaseInputDStream correctly because I am not able to provide to the constructor the streamFrom and streamTo parameter. In addition I have not idea ho to retrieve changed couchbase documents from notifications.
@ldoguin
As far as I undersrsand with the JDBC driver I am not able to move data from Couchbase to the RDBMS incrementally and I think that this is essential on a huge data set. Am I wrong?
You can use spark streaming for the incremental changes, but this of course requires more work to “ingest” properly and also streaming support is experimental right now. A different option would be to “poll for changes” with a N1QL query that fits this criteria, like “where updated_at > …” and so on.
@ldoguin
The problem is that to create reports I have to do a select all on a lot of documents and I think that RDBMS databases are more performant than couchbase on this kind of operations. This is the reason I had an idea to use an RDBMS with a schema already optimized for reports.
@giovanni.casella I don’t know if thats really the case - did you benchmark it? N1QL is pretty good at scale out with the new GSI especially when in memory. Combined with KV fetches you can get awesome performance.
And you could do the analysis directly in spark? I’m not sure you really need to go back into an RDBMS at all, can you tell us more about your use case?
The problem is on one side that an example is missing and on the other side that I was not able to instantiate and use the CouchbaseInputDStream class correctly.
@daschl How can I combine N1QL with KV fetches? With couchbase 4.2 the only way to achieve fast N1QL queries was adding covering indexes but a Couchbase engineered told us that is better to avoid more than 5 indexes for each bucket so we moved from N1QL to KV fetch in almost all the cases (and it was painful). Now I would like to avoid adding indexes for reports.
Regarding the reports with Spark I must admit that I am newbye with spark that I meet this morning for the first time while I have some experiences with some tools able to retrieve data from RDBMS.
@giovanni.casella combining n1ql with kv fetches would be a SELECT META().id as id FROM … and then you get back the document IDs which you can “pip” into the KV fetch.
I’ll see if I can get a sample together, using scala is not an option for you at this point?