Trying to query a couchbase view using the spark connector via Java but I’m running into errors. Hopefully someone can help here.
Tried with versions of spark-core/spark-sql/spark-connector 2.0.0 as well as versions 2.1.0 unsuccessfully against Couchbase 4.0 community.
SparkSession spark = SparkSession.builder()
.config("spark.couchbase.nodes", <couchbase host>)
.config("spark.couchbase.bucket.interface", <bucket password>)
.master("local[*]")
.getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
CouchbaseSparkContext csc = couchbaseContext(jsc);
csc.couchbaseView(ViewQuery.from("interface", "channel")).collect().forEach(System.out::println);
Now I realize this doesn’t do much but I’m just trying to get it to work initially. When running this code, I only get the following exceptions:
17:56:00.066 [main] INFO org.apache.spark.SparkContext - Starting job: collect at SparkDriver.java:23
Exception in thread “main” java.lang.NullPointerException
at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:391)
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:403)
at org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:304)
at org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:339)
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:849)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1626)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:893)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.collect(RDD.scala:892)
at org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:360)
at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
at com.comcast.deviceswap.recon.SparkDriver.main(SparkDriver.java:23)
17:56:00.074 [dag-scheduler-event-loop] WARN org.apache.spark.scheduler.DAGScheduler - Creating new stage failed due to exception - job: 0
java.lang.NullPointerException: null
at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:391) ~[spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:403) ~[spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGScheduler.getParentStagesAndId(DAGScheduler.scala:304) ~[spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGScheduler.newResultStage(DAGScheduler.scala:339) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:849) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1626) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) [spark-core_2.11-2.0.0.jar:2.0.0]
17:56:00.077 [main] INFO org.apache.spark.scheduler.DAGScheduler - Job 0 failed: collect at SparkDriver.java:23, took 0.010186 s
17:56:00.079 [Thread-1] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
Does anybody have any working examples of the spark couchbase connector querying views?