Index with join

wakko · January 27, 2017, 3:55am

Hi! I’ve created this index
CREATE INDEX idx-join-bsa-trx-with-events ON BSA_TX_EVENTS (lastEvent.publicRequestKey);
and I have this query
select * FROM BSA_TRANSACTION bvtpTransaction JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) for bvtpTransaction;

the explain is this

[ { "plan": { "#operator": "Sequence", "~children": [ { "#operator": "PrimaryScan", "index": "#primary", "keyspace": "BSA_TRANSACTION", "namespace": "default", "using": "gsi" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "as": "bvtpTransaction", "keyspace": "BSA_TRANSACTION", "namespace": "default" }, { "#operator": "IndexJoin", "as": "trxEvents", "for": "bvtpTransaction", "keyspace": "BSA_TX_EVENTS", "namespace": "default", "on_key": "((trxEvents.lastEvent).publicRequestKey)", "scan": { "index": "public_request_key", "index_id": "2e919385488397b5", "using": "gsi" } }, { "#operator": "InitialProject", "result_terms": [ { "expr": "self", "star": true } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "select * FROM BSA_TRANSACTION bvtpTransaction JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) for bvtpTransaction;" } ]

So the query is too slow for mee. Can u give me some feedback to speed up that query ?
Thanks in advance!

keshav_m · January 27, 2017, 6:25am

Change this to:

select * FROM BSA_TRANSACTION bvtpTransaction JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) for bvtpTransaction WHERE lastEvent.publicRequestKey IS NOT MISSING ;

In Flexible Schema world, we’ve to account for missing fields. So, a predicate here will enable the optimizer to chose the right index.

wakko · January 27, 2017, 12:27pm

Hi thanks for the reply but it does not work. The query does not use the index yet based on a explain query

Am i missing something when I create the index ?

keshav_m · January 27, 2017, 1:07pm

The query I gave had an issue. The predicate should use fully qualified attribute name.

Please try this.

select * 
FROM BSA_TRANSACTION bvtpTransaction 
          JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) 
         for bvtpTransaction 
WHERE bvtpTransaction.lastEvent.publicRequestKey IS NOT MISSING ;

wakko · January 27, 2017, 1:40pm

I tried with that query

this is the explain

Am I missing something with the explain ?

[ { "plan": { "#operator": "Sequence", "~children": [ { "#operator": "PrimaryScan", "index": "#primary", "keyspace": "BSA_TRANSACTION", "namespace": "default", "using": "gsi" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Fetch", "as": "bvtpTransaction", "keyspace": "BSA_TRANSACTION", "namespace": "default" }, { "#operator": "IndexJoin", "as": "trxEvents", "for": "bvtpTransaction", "keyspace": "BSA_TX_EVENTS", "namespace": "default", "on_key": "((trxEvents.lastEvent).publicRequestKey)", "scan": { "index": "public_request_key", "index_id": "2e919385488397b5", "using": "gsi" } }, { "#operator": "Filter", "condition": "(((bvtpTransaction.lastEvent).publicRequestKey) is not missing)" }, { "#operator": "InitialProject", "result_terms": [ { "expr": "self", "star": true } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "select * \nFROM BSA_TRANSACTION bvtpTransaction \n JOIN BSA_TX_EVENTS trxEvents on key (trxEvents.lastEvent.publicRequestKey) \n for bvtpTransaction \nWHERE bvtpTransaction.lastEvent.publicRequestKey IS NOT MISSING ;" } ]

geraldss · January 27, 2017, 5:41pm

Hi @wakko,

Both your original query and Keshav’s query are correct. You are not filtering on bvtpTransaction at all, so the left side is a full scan. Is that your intention? To query all the data? If so, you can just transpose left and right terms.

SELECT *
FROM events JOIN transaction ON KEYS event.lastEvent.publicRequestKey;

Note the different syntax.

wakko · January 27, 2017, 6:52pm

maybe I’m wrong…I want to perform a join query using an index the purpose is to speed up the query

can u explain me how to build the correct index for that task ?

Thanks!

geraldss · January 27, 2017, 6:55pm

They query I sent you will accomplish the fastest JOIN. Can you try it out and tell us.

wakko · January 27, 2017, 8:29pm

Yeah! when I add

and trxEvents.lastEvent.publicRequestKey is not missing;

it’s using the index but it seems too slow to me about 4 seconds to retrieve 15 documents.

Can u help me to build a map reduce view to cover the join ?

I do something like this can you tell me if its Ok ?

function(doc, meta){
  
  var emitByDocType = {
  	"Transaction": function(doc, meta){
      emit( [meta.id, 0, doc.operationData.OperationID,doc.generalData.merchant], doc)
    },
    "TrxEvents": function(doc, meta){
      emit( [doc.lastEvent.publicRequestKey, 1], doc)
    }
  };
  
  if(doc.type){
    emitByDocType[doc.type](doc, meta);
  }
  
}

Thanks!

geraldss · January 27, 2017, 8:40pm

Fair enough, but there must be something wrong with the setup or environment. Adding @arungupta as I haven’t used MapReduce views.

arungupta · January 27, 2017, 9:03pm

DA team has grown in the N1QL era but have asked @nraboy to take a look.

nraboy · January 27, 2017, 9:57pm

Hey @wakko,

Can you please provide a data model of the two document types you’re trying to join as well as the data you wish to get back for each hit?

The reason I ask is because I saw you started with a very simplistic query:

SELECT *
FROM events JOIN transaction ON KEYS event.lastEvent.publicRequestKey;

Then you ended up with a proposed View that was much more complex. Based on the information I have, I would have expected the following:

function (doc, meta) {
    if(doc.type == "transaction") {
        emit(meta.id, doc);
    }
    if(doc.type == "trxevents") {
        emit(doc.lastEvent.publicRequestKey, doc);
    }
}

Of course it is same concept of what you provided, but you were using composite keys for some reason. Then when you query this view and provide an id as the key, both documents should come back.

You should be aware that since Views don’t do joins, the experience won’t be the same to what you’d find in a N1QL query.

Best,

wakko · January 28, 2017, 2:25pm

The composite it’s to filter with those keys … the view that you give me works fine but I have a problem. The documents are in different buckets so I only can get one type of document no both of them.

Can I do something to retrieve document from other bucket in the view or the documents must to be in the same bucket ?

Thank for your tips

nraboy · January 30, 2017, 2:59pm

Views are per Bucket so you wouldn’t be able to create a view that covers both Buckets. This further emphasizes the power of N1QL and the effort that is being put into it in every release.

If you needed to query across buckets using views, you’d have to do multiple queries. This would all be managed in the application layer.

Best,

wakko · January 30, 2017, 5:51pm

Yes I solved the issue with two views…because is faster than a join for the same data.

Thanks!

Topic		Replies	Views
Use indexes with JOIN SQL++ n1ql	1	1559	May 4, 2017
Using Index not working correctly or am I using it wrong? SQL++ query , n1ql , server , index	12	3615	September 21, 2023
Use secondry index with (join in the same bucket)! SQL++ query , n1ql	10	2038	September 21, 2016
Index Required for the N1QL SQL++ query	4	977	December 28, 2017
Index on joined field SQL++ query , n1ql	8	1045	May 8, 2018

Index with join

Related topics