How to add Json array into dataframe

abhi_sadana · May 27, 2020, 8:17pm

[{
“Name” :“Test”
“Salary”:2000
“Department”:“Maths”
},
{
“Name” :“Test”
“Salary”:2000
“Department”:“Maths”
}
]

I want to convert this json array into dataframe with the couchbase connector . Also this json array is not fixed , so i cannot add the schema mannualy and also i cannot add the key to the array.
Have tried this code after doing indexing on the bucket and putting all the mandatory configurations such as username , password ,serverIp etc.

val jsonArray = sql.read.couchbase

But resulted dataframe only contains META_ID. It is unable to read the schema of json array.

What to do ?

`

daschl · May 29, 2020, 6:34am

@abhi_sadana it’s not clear to me what you want to achieve exactly. Are you loading data through the connector and then want to turn it into a dataframe? Or do you have an in-memory representation of a JsonArray and want to convert this one into a dataframe?

abhi_sadana · June 2, 2020, 7:02pm

@daschl Thank you for your reply ,
Json Corrected -

[
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
},
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
}
]

Note - There is only 1 json in Bucket.

Yes I am loading data from connector and converted this json data in dataframe directly .

Is there any way to get whole of the elements in the dataframe instead of only META_ID ?

graham.pople · June 3, 2020, 4:50pm

@abhi_sadana I think that should work; I wonder if you’re hitting a limit of the automatic schema inference there (perhaps it cannot infer a schema from a small set of docs, such as the single doc you have here).

But you can specify a schema manually, which is easy enough. Please see the Spark docs here for more Spark SQL Integration | Couchbase Docs

abhi_sadana · June 4, 2020, 7:12am

@graham.pople Thank you for your reply.
I want to mention two points here -:

I have edited the Document by providing key to the Json Array

{
“test”: [
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
},
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
}
]
}

Resulted DataFrame-

±------±-------------------+
|META_ID| test|
±------±-------------------+
| 122|[[Maths, Test, 20…|
±------±-------------------+

Schema

root
|-- META_ID: string (nullable = true)
|-- test: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Department: string (nullable = true)
| | |-- Name: string (nullable = true)
| | |-- Salary: long (nullable = true)

So , Is there a posibility that connector is not able to load data into dataframe properly in case of keyLess Json array ?

I cannnot add schema mannualy , because in my scenario i am unaware about the data present in bucket , I got to know the schema at runtime only after the creation of dataframe.

abhi_sadana · June 9, 2020, 5:32am

@graham.pople @daschl Any thoughts on this ??

Topic		Replies	Views
Unable to create spark dataframe for nested Json array Spark Connector spark , n1ql	0	1446	April 26, 2020
Write a Spark DataFrame to Couchbase Spark Connector spark , connections , java	1	2415	October 19, 2017
Spark Python Connection Spark Connector spark , connections , n1ql	2	3047	October 18, 2016
Couchbase insert array using spark connector Spark Connector	3	1954	April 12, 2018
ArrayIndexOutOfBoundsException Spark Connector spark , query	3	7495	October 25, 2016

How to add Json array into dataframe

Related topics