I load Spark DataFrames with the Spark Connector with
import com.couchbase.spark.sql._
val df = sqlContext.read.couchbase(EqualTo("type", "data"))
df.printSchema
// root
// |-- META_ID: string (nullable = true)
// |-- data: double (nullable = true)
// |-- date: double (nullable = true)
// |-- type: string (nullable = true)
// |-- user_id: long (nullable = true)
The type of user_id
column is inferred as long, which is fine.
However, when filtering rows based on this column, I get some unexpected behavior.
The following does not work:
df.where(df("user_id") === 1L)
df.where(df("user_id") === 1)
df.where(df("user_id").cast("long") === 1L)
df.where(df("user_id").cast("long") === 1)
All of the above expressions will evaluate to false on every row, filtering out everything, regardless of the fact that the column user_id
does in fact contain the value 1 in some rows.
The following does however work:
df.where(df("user_id") === "1")
df.where(df("user_id").cast("int") === 1)
df.where(df("user_id").cast("int") === 1L)
This seems to me to be a bug, either in Spark SQL or the Spark Connector. I am assuming that the where
call is translated to a N1QL query somewhere down the line, which may be where the trouble is.
Can somebody convince me that this behavior is intended? Or should I report it as a bug?