[{
“Name” :“Test”
“Salary”:2000
“Department”:“Maths”
},
{
“Name” :“Test”
“Salary”:2000
“Department”:“Maths”
}
]
I want to convert this json array into dataframe with the couchbase connector . Also this json array is not fixed , so i cannot add the schema mannualy and also i cannot add the key to the array.
Have tried this code after doing indexing on the bucket and putting all the mandatory configurations such as username , password ,serverIp etc.
val jsonArray = sql.read.couchbase
But resulted dataframe only contains META_ID. It is unable to read the schema of json array.
What to do ?
`
@abhi_sadana it’s not clear to me what you want to achieve exactly. Are you loading data through the connector and then want to turn it into a dataframe? Or do you have an in-memory representation of a JsonArray and want to convert this one into a dataframe?
@daschl Thank you for your reply ,
Json Corrected -
[
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
},
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
}
]
Note - There is only 1 json in Bucket.
Yes I am loading data from connector and converted this json data in dataframe directly .
Resulted dataframe
±------+
|META_ID|
±------+
| 1|
±------+
This dataframe contains only single column and rest of the data is not available in it.
Dataframe Schema-:
|-- META_ID: string (nullable = true)
Schema is also missing rest of the elements.
Is there any way to get whole of the elements in the dataframe instead of only META_ID ?
@abhi_sadana I think that should work; I wonder if you’re hitting a limit of the automatic schema inference there (perhaps it cannot infer a schema from a small set of docs, such as the single doc you have here).
But you can specify a schema manually, which is easy enough. Please see the Spark docs here for more Spark SQL Integration | Couchbase Docs
@graham.pople Thank you for your reply.
I want to mention two points here -:
- I have edited the Document by providing key to the Json Array
{
“test”: [
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
},
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
}
]
}
Resulted DataFrame-
±------±-------------------+
|META_ID| test|
±------±-------------------+
| 122|[[Maths, Test, 20…|
±------±-------------------+
Schema
root
|-- META_ID: string (nullable = true)
|-- test: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Department: string (nullable = true)
| | |-- Name: string (nullable = true)
| | |-- Salary: long (nullable = true)
So , Is there a posibility that connector is not able to load data into dataframe properly in case of keyLess Json array ?
- I cannnot add schema mannualy , because in my scenario i am unaware about the data present in bucket , I got to know the schema at runtime only after the creation of dataframe.
@graham.pople @daschl Any thoughts on this ??