I have some confusion about key generation and indexes

dickersonc · April 27, 2018, 5:00pm

I have imported a dataset into Couchbase that looks like so:

{
  "CLUSTER": "M1M",
  "CLUSTER_NAME": "MARTIN MARIETTA",
  "PRIMARY": "",
  "SET_NUM": "10000163",
  SHORTENED_NAME": "MARTIN MARIETTA MATERIALS",
  "TYPE": "SET",
  "_class": "com.company.aad.xref.model.ClusterCodeXref"
}

I had to provide a key-generation strategy, and I made the strategy what I ultimately want my index to look like; %SET_NUM%::%TYPE%. So I have a few questions:

Does the key-generation automatically create a field called ID with those 2 elements, or do I need to create an ID column in my CSV dataset?
How can I create an index out of those 2 fields? I understand how to use the CREATE INDEX command with composite fields, but will that index look like the key generated by %SET_NUM%::%TYPE%? I need them to be the same.

I hope my question is clear! Would appreciate any help.

vsr1 · April 28, 2018, 12:59am

The key can be accessed in query or index using META().id

My suggestion is switch the key to %TYPE%::%SET_NUM% so that it makes easier for query and efficient.
Example : Your bucket name is default
Primary index means index on document key on whole bucket.

            CREATE PRIMARY INDEX ON default;
            SELECT META().id, * FROM default WHERE META().id LIKE "SET::%";
            This is more efficient because "SET::" is prefix and can be passed to indexer and indexer 
                 produces only entries started with "SET::".
            If "::SET" is postfix index needs produce all index entries and query need to apply filter

You can also create Secondary Index

CREATE INDEX ix1 ON default (`SET_NUM`) WHERE `TYPE` = "SET";
SELECT META().id * FROM default WHERE `TYPE` = "SET" AND `SET_NUM` > "10000163";

keshav_m · April 30, 2018, 12:27am

In cbimport, the --generate-key option takes the expression to generate a DOCUMENT KEY, which lives outside the document. The key generation takes the expression to create the DOCUMENT KEY for each document imported. You simply need the SET_NUM and TYPE fields in your CSV file, which you have.

The PRIMARY INDEX on the imported data already looks like your key expression.

You can also create a separate index (if you like) with the expression, essentially creating the same index with duplicate (value, rowid). No real good reason to do that.

CREATE INDEX i1 ON bucket(SET_NUM || “::” || TYPE)

dickersonc · April 30, 2018, 11:24am

Thank you both for such clear explanations!

The thing is, I’m really working with 2 use cases. I’m doing an initial load of Couchbase with cbimport, and thereafter data will be manually entered and queried by an application written with spring-data-couchbase.

Spring requires a document ID for the findById method. That’s why I thought I needed the generated key to live in both the document and the index (for faster retrieval). Does that make any sense to do?

eben · April 30, 2018, 6:46pm

@dickersonc - Couchbase started out long ago as a key-value store, where each document has a key, and separate from that a value that could be a string (such as JSON), or even binary data.

When you put a document into Couchbase, you must give it a key, and the very fastest way to retrieve a document is to identify it with the key. That is why following a consistent key pattern is valuable.

There is no need to put the key into the document itself, though you can if you really want to. Even though the key isn’t inside the document, it is automatically tied to the document. When you create a “primary” index, that index looks at the key, not the document.

For your two use cases, use the same key pattern for cbimport and for inserting/upserting new documents with Spring.

dickersonc · April 30, 2018, 7:00pm

Thank you Eben! You’ve saved me a lot of work.

Can you tell me this though: As we insert new documents, how are their keys created? Is that something you specify in N1QL (I’m totally new to Couchbase)?

eben · April 30, 2018, 7:29pm

If you insert documents through any of the SDKs, one of the required parameters is the document key.

If you insert documents through N1QL, the syntax also includes a key along with the document. E.g.,

INSERT INTO `travel-sample` (KEY,VALUE) 
VALUES ( "airline_4444", 
    { "callsign": "MY-AIR",
      "country": "United States",
      "iata": "Z1",
      "icao": "AQZ",
      "name": "80-My Air",
      "id": "4444",
      "type": "airline"} ),
VALUES ( "4445", { "callsign": "AIR-X",
      "country": "United States",
      "iata": "X1",
      "icao": "ARX",
      "name": "10-AirX",
      "id": "4445",
      "type": "airline"} ) ;

Topic		Replies	Views
Clarifications on Couchbase index SQL++ query , n1ql	4	1570	March 2, 2022
Create indexes in Couchbase SQL++	2	621	January 23, 2020
Is it possible to select data by key pattern without adding extra index? Couchbase Server query , meta , java	5	4035	November 9, 2017
Primary index is it keys only or also values? Couchbase Server n1ql , index	11	3291	February 17, 2020
Create Primary Index syntax? Couchbase Server	6	1789	April 14, 2022

I have some confusion about key generation and indexes

Related topics