Differences between primary and secondary Indexes?

Hello,
What are the main differences Differences between primary and secondary Indexes?
What are the best practices in choosing between using primary or secondary indexes?
I have seen a primary index can be created as a secondary index; I don’t really understand the differences.
I have read the documentation but still not clear to me ;
http://docs.couchbase.com/4.0/n1ql/n1ql-language-reference/createprimaryindex.html

Thanks
W

1 Like

A primary index of N1QL will allow you to query any field of your JSON. As such, it takes up more space and is less efficient (but much more flexible). It is the basic requirement to be able to use the query language on your data.

A secondary index of N1QL is targeted at specific field(s) and will be more performant for queries involving these fields (because the index to scan will be shorter and more to the point).

4 Likes

thanks @simonbasle; is secondary index is like a view?

No, it is still an index, same as what you’re used to in SQL, its role is to back queries and make them faster. In SQL databases you don’t usually have to create the primary index (it is done when you define a primary key).

2 Likes

clear with big thanks! @simonbasle

1 Like

also unlike relational databases, primary index is optional. as long as you don’t want to do adhoc queries and you have dedicated secondary indexes for all your queries, you can run your system without primary indexes.

1 Like

so the primary index is just an index on the key and need to be maintained (index rebuild etc …) ?
@cihangirb @simonbasle

@couchbwiss AFAIK it is not just indexing the keys (but I could be wrong there), and it shouldn’t need much maintenance.

What @cihangirb meant to say I think is that it is not mandatory to have one, if you know exactly what queries you’ll run in advance: you can in this case build tailor-made secondary indexes that will cover what’s needed by your queries and can skip the primary index altogether.

The thing is, if one of these queries targets a field that’s not indexed by a secondary index, it’ll fallback to the primary index and complain there’s none.

@simonbasle @cihangirb Thank you.
Next question about indexes;
I have a bucket called mybucket which has different document structures distinguished by a filed called type in the json.
let 's take an example in mybucket i have
documents of type user:

{
“userID”:“1”,
“Type”:“user”,
“username”:“test1”
}

and documents of type products:

{
“productID”:“1”,
“Type”:“product”,
“productName”:“p1”
}

How can I create an index from N1QL on only product documents of the mybucket. is it possible to do it?
The reason, I will NOT use that index for user document queries.

Thanks,
W

this should definitely be possible, yes. If I’m not mistaken:

CREATE INDEX products ON default (type) WHERE type = 'product';

@cihangirb I think I remember there was a limitation on the WHERE clause of an index though?

1 Like

The syntax is correct for CREATE INDEX. a few things to note:
the predicate you index default(type) vs what you filter on can be different. So you could index both productName for example;
CREATE INDEX productName_index ON bucket_name(productName, ProductID) WHERE type=“product” USING GSI;

if you have multiple nodes with index service running, you can also place the indexes on a given node and have multiple indexes to scale out your reads.
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type=“product” USING GSI WITH {“nodes”:“node1:8091”};
CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type=“product” USING GSI WITH {“nodes”:“node2:8091”};

With the indexes above, half the scans will go to productName_index1 and the other half will end up on productName_index2.

another way you can also index is to partition your indexes; (I am picking something simple for the purposes of the example and I am sure the readers can come up with more sophisticated ways to partition indexes.)
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type=“product” AND productName BETWEEN “A” AND “K” USING GSI WITH {“nodes”:“node1:8091”};
CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type=“product” AND productName BETWEEN “K” AND “Z” USING GSI WITH {“nodes”:“node2:8091”};

With the indexes above, if you search for productName = “APPLE WATCH” the scans will go to productName_index1 and productName = “SAMSUNG WATCH” will end up on productName_index2.

thanks
-cihan

2 Likes

Thanks @cihangirb and @simonbasle
Next questions;
1- what happens if the node 1 having half part of the index fails or goes down, does the queries using the partitioned index end with an exception or error?
2- is it possible to specify an index replication in another node for each partitioned index?
3- Is it possible to specify more than one node having one single index partition?

Now about the index maintenance;
3- when the index will e candidate for a rebuild?
4- for the query optimization ; is it a rule based optimization or a cost based?

Thanks
W

@cihangirb @simonbasle

I see an issue with the partitioned index, , I have created 2 partitions on 2 different nodes.but the explain plan shows only one is in use. Do you know the reason? I have checked the console and I only see IX_TYPE_ID_PART2 created where the other no. I am sure I have created both; does the IX_TYPE_ID_PART2 replaced the IX_TYPE_ID_PART1?

CREATE INDEX IX_TYPE_PART1 ON product(Type) WHERE Type=“product” USING GSI WITH {“nodes”:“10.71.52.142:8091”};
CREATE INDEX IX_TYPE_ID_PART2 ON product(Type) WHERE Type=“product” USING GSI WITH {“nodes”:“10.71.52.137:8091”};

cbq> EXPLAIN select pid, REPLACE(REPLACE(pName, “,”, “:”),“\t”," ") AS pName from product WHERE Type=“product”;
{
“requestID”: “674e0931-cb59-4ab9-ac36-b430f26a695d”,
“signature”: “json”,
“results”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan”,
“index”: “IX_TYPE_PART1”,
“keyspace”: “product”,
“limit”: 9.223372036854776e+18,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“"product"”
],
“Inclusion”: 3,
“Low”: [
“"product"”
]
},
“Seek”: null
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Fetch”,
“keyspace”: “product”,
“namespace”: “default”
},
{
#operator”: “Filter”,
“condition”: “((product.Type) = "product")”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(product.pid)”
},
{
“as”: “pName”,
“expr”: “replace(replace((product.pName), ",", ":"), "\t", " ")”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “5.227177ms”,
“executionTime”: “4.511846ms”,
“resultCount”: 1,
“resultSize”: 2340
}
}

cbq>

Hi @couchwiss, you may not necessarily see the explain refer to both indexes but when you run the queries, you should see both getting requests under the stats tab > the picture here shows i1 and i2 (which are identical indexes) taking equal load.

Thanks @cihangirb and @simonbasle
Next questions;
1- what happens if the node 1 having half part of the index fails or goes down, does the queries using the partitioned index end with an exception or error?
2- is it possible to specify an index replication in another node for each partitioned index?
3- Is it possible to specify more than one node having one single index partition?

Now about the index maintenance;
3- when the index will e candidate for a rebuild?
4- for the query optimization ; is it a rule based optimization or a cost based?

Thanks
W

@cihangirb @simonbasle
I see an issue now; I have created partitioned indexes;

CREATE INDEX IX_TYPE_PART1 ON product(Type) WHERE Type=“p1” USING GSI WITH {“nodes”:“10.71.52.142:8091”,“nodes”:“10.71.52.138:8091”};
CREATE INDEX IX_TYPE_PART2 ON product(Type) WHERE Type=“p1” USING GSI WITH {“nodes”:“10.71.52.137:8091”,“nodes”:“10.71.52.138:8091”};

but on the console I see IX_TYPE_PART1 and IX_TYPE_PART2 are only created in the node 10.71.52.138 not in the others.

Thanks
W

Answering both issues;
1- what happens if the node 1 having half part of the index fails or goes down, does the queries using the partitioned index end with an exception or error?
[cb]if you have a primary index or another secondary index that can be used for the scan, the query will work fine but without this index, your query response time may vary if the other indexes are not as useful.recommendation is to mirror the index partitions. **
2- is it possible to specify an index replication in another node for each partitioned index?
[cb]we don’t support replication/replicas with GSI but you can manually do this. The way to do this would be to create the 2 identical indexes on 2 separate nodes (a.k.a mirror the index). If one node fails and takes out one copy of the index, all traffic will hit the remaining available index.
3- Is it possible to specify more than one node having one single index partition?
[cb]not at this time. you need to name each index differently. index name in a given bucket has to be unique so we require separate statements for each index and each mirror with CREATE INDEX

Now about the index maintenance;
3- when the index will e candidate for a rebuild?
[cb]there isn’t a reason to rebuild the indexes unless you want to move them to another node or want to change their definitions.
4- for the query optimization ; is it a rule based optimization or a cost based?
[cb]rule based at this point

for your example on the previous post: At the moment, we don’t support specifying multiple nodes clauses for a single index. you can do the following;
CREATE INDEX IX_TYPE_PART1 ON product(Type) WHERE Type=“p1” USING GSI
WITH {“nodes”:“10.71.52.142:8091”};
CREATE
INDEX IX_TYPE_PART2 ON product(Type) WHERE Type=“p1” USING GSI WITH
{“nodes”:“10.71.52.138:8091”};

2 Likes