Join based on keys and arrays

@vsr1 Iam strugling with a query and I could not figure out what I am doing wrong.

I have a few documents , which I would like to query by the key:

the document format is this:

{
  "chart_timestamp": 1597590000,
  "chart_apps": {
    "aaa": 270,
    "bbb": 357,
    "ccc": 451,
    "ddd": 93
  },
  "version": 1
}

I have multiple documents like this, but with different object (chart_apps) keys (some keys will repeat on other documents for example here (“bbb” and “ddd”):

{
  "chart_timestamp": 1597590000,
  "chart_apps": {
    "eee": 270,
    "bbb": 3,
    "fff": 451,
    "ddd": 76
  },
  "version": 1
}

Another example:

{
  "chart_timestamp": 1597590000,
  "chart_apps": {
    "aaa": 14,
    "bbb":9,
    "fff": 32,
    "ccc": 778
  },
  "version": 1
}

Now, what I need is to be able to join multiple documents based on their keys and generate a response like this:

{
   "foo":{
      "aaa":{
         "first_join":270, // value_from_frist_document
         "second_join": null,  //value_from_the_second_document
         "third_join": 14  //value_from_the_third_document
      }
      },
      "bbb":{
         "first_join":357, // value_from_frist_document
         "second_join":3, //value_from_the_second_document
         "third_join":9 //value_from_the_third_document
      }
      },
      "ccc":{
         "first_join": 451 , // value_from_frist_document
         "second_join": null, //value_from_the_second_document
         "third_join": 778 //value_from_the_third_document
      }
      },
      "ddd":{
         "first_join":93, // value_from_frist_document
         "second_join": 76, //value_from_the_second_document
         "third_join":null //value_from_the_third_document
      }
      },
      "eee":{
         "first_join":null, // value_from_frist_document
         "second_join":270, //value_from_the_second_document
         "third_join":null //value_from_the_third_document
      }
      },
      "fff":{
         "first_join": null, // value_from_frist_document
         "second_join": 451, //value_from_the_second_document
         "third_join":32 //value_from_the_third_document
      }
   }
}

some of the joined documents may not have that key/value and I need to have all the keys from all the documents in the new resulting object and for the missing value I can have null.

Is there a way to do this?

SELECT OBECT lv.name:lv.obj FOR lv IN (
       SELECT op.name, OBJECT v.id:v.val FOR v IN ag END obj
       FROM default AS d
       LET op = OBJECT_PAIRS(d.chart_apps)
      WHERE ....
      GROUP BY op.name
      LETTING ag =  ARRAY_AGG({"id": META(d).id, op.val}) ) END ;

Hmm, and where do I put the document keys that I need to join? also the order of the joined documents is important. I need to provide these to the query in a specific order to know which one of the value is comming from which document (the order of the joins),

can I do this?

SELECT OBECT lv.name:lv.obj FOR lv IN (
       SELECT op.name, OBJECT v.id:v.val FOR v IN ag END obj
       FROM default AS d
       LET op = OBJECT_PAIRS(d.chart_apps)
      use keys array["key1","key2","key3"]
      GROUP BY op.name
      LETTING ag =  ARRAY_AGG({"id": META(d).id, op.val}) ) END ;

The keys from the “chart_apps” object are not document keys, they are just some object keys which I need to join

If you need order you can’t use Object , You must leave them in ARRAY and sort the array desired order using subquery expression.

SELECT OBECT lv.name:lv.obj FOR lv IN (
        SELECT op.name, OBJECT v.id:v.val FOR v IN ag END obj
        FROM [{"k":"key1", "seq":1}, {"k":"key2", "seq":2}, {"k":"key3", "seq":3}] AS d
        JOIN bucket AS d1 ON KEYS d.k
        LET op = OBJECT_PAIRS(d1.chart_apps)
        GROUP BY op.name
        LETTING ag =  ARRAY_AGG({"id": TO_STR(d.seq), op.val}) ) END ;

Hmm, I don’t understand. What are these? {“k”:“key1”, “seq”:1}, {“k”:“key2”, “seq”:2}, {“k”:“key3”, “seq”:3}. Do I need to make a subquery and return an array of objects ?

Can you please have a look again over the initial post? I really appreciate you are putting time in this. I have tried all day long to make this work :frowning:

@flaviu ,

WITH data AS ([{ "chart_timestamp": 1597590000, "chart_apps": { "aaa": 270, "bbb": 357, "ccc": 451, "ddd": 93 }, "version": 1 },
               { "chart_timestamp": 1597590000, "chart_apps": { "eee": 270, "bbb": 3, "fff": 451, "ddd": 76 }, "version": 1 },
               { "chart_timestamp": 1597590000, "chart_apps": { "aaa": 14, "bbb":9, "fff": 32, "ccc": 778 }, "version": 1 }])
SELECT OBJECT lv.name:lv.obj FOR lv IN (
        SELECT op.name, OBJECT v.id:v.val FOR v IN ag END obj
        FROM 1 AS d
        UNNEST data AS d1 UNNEST OBJECT_PAIRS(d1.chart_apps) AS op
        GROUP BY op.name
        LETTING ag =  ARRAY_AGG({"id": TO_STR(UNNEST_POS(d1)), op.val}) ) END AS foo ;

{
    "requestID": "fa5ccd56-fede-4df5-b8fc-0625b9c56a10",
    "signature": {
        "foo": "object"
    },
    "results": [
    {
        "foo": {
            "aaa": {
                "0": 270,
                "2": 14
            },
            "bbb": {
                "0": 357,
                "1": 3,
                "2": 9
            },
            "ccc": {
                "0": 451,
                "2": 778
            },
            "ddd": {
                "0": 93,
                "1": 76
            },
            "eee": {
                "1": 270
            },
            "fff": {
                "1": 451,
                "2": 32
            }
        }
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "5.326463ms",
        "executionTime": "5.230674ms",
        "resultCount": 1,
        "resultSize": 547,
        "serviceLoad": 2
    }
}

You give sorted array as data
organizes based on 0th, 1st ,2nd document

WITH data AS ([{ "chart_timestamp": 1597590000, "chart_apps": { "aaa": 270, "bbb": 357, "ccc": 451, "ddd": 93 }, "version": 1 },
               { "chart_timestamp": 1597590000, "chart_apps": { "eee": 270, "bbb": 3, "fff": 451, "ddd": 76 }, "version": 1 },
               { "chart_timestamp": 1597590000, "chart_apps": { "aaa": 14, "bbb":9, "fff": 32, "ccc": 778 }, "version": 1 }]),
     names AS (["first_join","second_join","third_join"]),
     dummyobj AS (OBJECT v:null FOR v IN names END)
SELECT OBJECT lv.name: OBJECT_CONCAT(dummyobj,lv.obj) FOR lv IN (
        SELECT op.name, OBJECT v.id:v.val FOR v IN ag END obj
        FROM 1 AS d
        UNNEST data AS d1 UNNEST OBJECT_PAIRS(d1.chart_apps) AS op
        GROUP BY op.name
        LETTING ag =  ARRAY_AGG({"id": names[UNNEST_POS(d1)], op.val}) ) END as foo;
{
    "requestID": "832c960c-95a0-4fa3-97b3-59a3d530ecec",
    "signature": {
        "foo": "object"
    },
    "results": [
    {
        "foo": {
            "aaa": {
                "first_join": 270,
                "second_join": null,
                "third_join": 14
            },
            "bbb": {
                "first_join": 357,
                "second_join": 3,
                "third_join": 9
            },
            "ccc": {
                "first_join": 451,
                "second_join": null,
                "third_join": 778
            },
            "ddd": {
                "first_join": 93,
                "second_join": 76,
                "third_join": null
            },
            "eee": {
                "first_join": null,
                "second_join": 270,
                "third_join": null
            },
            "fff": {
                "first_join": null,
                "second_join": 451,
                "third_join": 32
            }
        }
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "5.652931ms",
        "executionTime": "5.567055ms",
        "resultCount": 1,
        "resultSize": 877,
        "serviceLoad": 2
    }
}
2 Likes

pfoa… you are out of this world. Thank you !

1 Like

Do you think it would be easy to order the keys (“aaa”,“bbb”, etc) by “second_join” or “third_join”?

Also, if I want to filter the values where “second_join” is bigger than “first_join”

In the object there is no order by field (in pretty format the fields are order by name). Sorting value no meaning here it is single object.
It is possible to filter you can do parent query and filter out.

Just switch to Mongo guys. We did after 4 years of expensive Couchbase hosting, bad performance and lack of flexibility, we had enough. Best decision ever! Here’s why: migrating from Couchbase to Mongodb