I am trying to use couchbase as the streaming source for spark structured streaming using spark connector.
val records = spark.readStream
.format(“com.couchbase.spark.sql”).schema(schema)
.load()
And I have this query
records
.groupBy(“type”)
.count()
.writeStream
.outputMode(“complete”)
.format(“console”)
.start()
.awaitTermination()
For this query I am not getting the correct output . My query output table is like this
Batch: 0
20/04/14 14:28:00 INFO CodeGenerator: Code generated in 10.538654 ms
20/04/14 14:28:00 INFO WriteToDataSourceV2Exec: Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter@17fe0ec7 committed.
±-------±----+
|type | count|
±-------±----+
±-------±----+
However if I use the couchbase to fetch the documents as non streaming. Like
val cdr = spark.read.couchbase(EqualTo(“type”, “cdr”))
cdr.count() gives the correct output. (count= 28).
Please let me know why this is not working with structured streaming.