Index with join

Hi! I’ve created this index
CREATE INDEX idx-join-bsa-trx-with-events ON BSA_TX_EVENTS (lastEvent.publicRequestKey);
and I have this query
select * FROM BSA_TRANSACTION bvtpTransaction JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) for bvtpTransaction;

the explain is this

[ { "plan": { "#operator": "Sequence", "~children": [ { "#operator": "PrimaryScan", "index": "#primary", "keyspace": "BSA_TRANSACTION", "namespace": "default", "using": "gsi" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "as": "bvtpTransaction", "keyspace": "BSA_TRANSACTION", "namespace": "default" }, { "#operator": "IndexJoin", "as": "trxEvents", "for": "bvtpTransaction", "keyspace": "BSA_TX_EVENTS", "namespace": "default", "on_key": "((trxEvents.lastEvent).publicRequestKey)", "scan": { "index": "public_request_key", "index_id": "2e919385488397b5", "using": "gsi" } }, { "#operator": "InitialProject", "result_terms": [ { "expr": "self", "star": true } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "select * FROM BSA_TRANSACTION bvtpTransaction JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) for bvtpTransaction;" } ]

So the query is too slow for mee. Can u give me some feedback to speed up that query ?
Thanks in advance!

Change this to:

select * FROM BSA_TRANSACTION bvtpTransaction JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) for bvtpTransaction WHERE lastEvent.publicRequestKey IS NOT MISSING ;

In Flexible Schema world, we’ve to account for missing fields. So, a predicate here will enable the optimizer to chose the right index.

Hi thanks for the reply but it does not work. The query does not use the index yet based on a explain query

Am i missing something when I create the index ?

The query I gave had an issue. The predicate should use fully qualified attribute name.

Please try this.

select * 
FROM BSA_TRANSACTION bvtpTransaction 
          JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) 
         for bvtpTransaction 
WHERE bvtpTransaction.lastEvent.publicRequestKey IS NOT MISSING ;

I tried with that query

this is the explain

Am I missing something with the explain ?

[ { "plan": { "#operator": "Sequence", "~children": [ { "#operator": "PrimaryScan", "index": "#primary", "keyspace": "BSA_TRANSACTION", "namespace": "default", "using": "gsi" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "as": "bvtpTransaction", "keyspace": "BSA_TRANSACTION", "namespace": "default" }, { "#operator": "IndexJoin", "as": "trxEvents", "for": "bvtpTransaction", "keyspace": "BSA_TX_EVENTS", "namespace": "default", "on_key": "((trxEvents.lastEvent).publicRequestKey)", "scan": { "index": "public_request_key", "index_id": "2e919385488397b5", "using": "gsi" } }, { "#operator": "Filter", "condition": "(((bvtpTransaction.lastEvent).publicRequestKey) is not missing)" }, { "#operator": "InitialProject", "result_terms": [ { "expr": "self", "star": true } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "select * \nFROM BSA_TRANSACTION bvtpTransaction \n JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) \n for bvtpTransaction \nWHERE bvtpTransaction.lastEvent.publicRequestKey IS NOT MISSING ;" } ]

Hi @wakko,

Both your original query and Keshav’s query are correct. You are not filtering on bvtpTransaction at all, so the left side is a full scan. Is that your intention? To query all the data? If so, you can just transpose left and right terms.

SELECT *
FROM events JOIN transaction ON KEYS event.lastEvent.publicRequestKey;

Note the different syntax.

maybe I’m wrong…I want to perform a join query using an index the purpose is to speed up the query

can u explain me how to build the correct index for that task ?

Thanks!

They query I sent you will accomplish the fastest JOIN. Can you try it out and tell us.

Yeah! when I add

and trxEvents.lastEvent.publicRequestKey is not missing;

it’s using the index but it seems too slow to me about 4 seconds to retrieve 15 documents.

Can u help me to build a map reduce view to cover the join ?

I do something like this can you tell me if its Ok ?

function(doc, meta){
  
  var emitByDocType = {
  	"Transaction": function(doc, meta){
      emit( [meta.id, 0, doc.operationData.OperationID,doc.generalData.merchant], doc)
    },
    "TrxEvents": function(doc, meta){
      emit( [doc.lastEvent.publicRequestKey, 1], doc)
    }
  };
  
  if(doc.type){
    emitByDocType[doc.type](doc, meta);
  }
  
}

Thanks!

Fair enough, but there must be something wrong with the setup or environment. Adding @arungupta as I haven’t used MapReduce views.

DA team has grown in the N1QL era but have asked @nraboy to take a look.

Hey @wakko,

Can you please provide a data model of the two document types you’re trying to join as well as the data you wish to get back for each hit?

The reason I ask is because I saw you started with a very simplistic query:

SELECT *
FROM events JOIN transaction ON KEYS event.lastEvent.publicRequestKey;

Then you ended up with a proposed View that was much more complex. Based on the information I have, I would have expected the following:

function (doc, meta) {
    if(doc.type == "transaction") {
        emit(meta.id, doc);
    }
    if(doc.type == "trxevents") {
        emit(doc.lastEvent.publicRequestKey, doc);
    }
}

Of course it is same concept of what you provided, but you were using composite keys for some reason. Then when you query this view and provide an id as the key, both documents should come back.

You should be aware that since Views don’t do joins, the experience won’t be the same to what you’d find in a N1QL query.

Best,

The composite it’s to filter with those keys … the view that you give me works fine but I have a problem. The documents are in different buckets so I only can get one type of document no both of them.

Can I do something to retrieve document from other bucket in the view or the documents must to be in the same bucket ?

Thank for your tips :slight_smile:

Views are per Bucket so you wouldn’t be able to create a view that covers both Buckets. This further emphasizes the power of N1QL and the effort that is being put into it in every release.

If you needed to query across buckets using views, you’d have to do multiple queries. This would all be managed in the application layer.

Best,

Yes I solved the issue with two views…because is faster than a join for the same data.

Thanks!

1 Like