Spark DataFrame loaded with Spark Connector 'where' clause doesn't work

sd1 · May 3, 2016, 1:51pm

I load Spark DataFrames with the Spark Connector with

    import com.couchbase.spark.sql._
    val df = sqlContext.read.couchbase(EqualTo("type", "data"))
    df.printSchema
    // root
    //  |-- META_ID: string (nullable = true)
    //  |-- data: double (nullable = true)
    //  |-- date: double (nullable = true) 
    //  |-- type: string (nullable = true)
    //  |-- user_id: long (nullable = true)

The type of user_id column is inferred as long, which is fine.
However, when filtering rows based on this column, I get some unexpected behavior.
The following does not work:

df.where(df("user_id") === 1L)
df.where(df("user_id") === 1)
df.where(df("user_id").cast("long") === 1L)
df.where(df("user_id").cast("long") === 1)

All of the above expressions will evaluate to false on every row, filtering out everything, regardless of the fact that the column user_id does in fact contain the value 1 in some rows.

The following does however work:

df.where(df("user_id") === "1")
df.where(df("user_id").cast("int") === 1)
df.where(df("user_id").cast("int") === 1L)

This seems to me to be a bug, either in Spark SQL or the Spark Connector. I am assuming that the where call is translated to a N1QL query somewhere down the line, which may be where the trouble is.

Can somebody convince me that this behavior is intended? Or should I report it as a bug?

daschl · July 8, 2016, 7:55am

@sd1 can you see if this still happens to you in spark 1.6 with our 1.2.1 connector?

Topic		Replies	Views
Spark Connector: No results returned when invoking sqlContext.read.couchbase() Spark Connector	6	2907	April 26, 2016
Create Spark dataset using N1QL Query in Java Spark Connector spark , n1ql	5	2822	January 5, 2017
Couchbase N1Ql query insert array using spark not work Spark Connector spark , n1ql	4	1984	April 13, 2018
Spark connector couchbase using java API, N1QL LIMIT option Spark Connector	8	576	October 15, 2023
Spark Python Connection Spark Connector spark , connections , n1ql	2	3023	October 18, 2016

Spark DataFrame loaded with Spark Connector 'where' clause doesn't work

Related topics