N1QL query slow

Hi,
I imported IP2Country database from http://software77.net/geo-ip/ to my Couchbase bucket. Total documents is about ~ 160k.
Example document:

{
  "IPFrom": 1000083456,
  "IPTo": 1000085503,
  "Registry": "apnic",
  "Assigned": 1100736000,
  "Ctry": "JP",
  "Cntry": "JPN",
  "Country": "Japan",
  "_type": "ip2c"
}

I created 2 GSI index:
CREATE INDEX doc_type_idx ON default(_type) WHERE _type is not missing
and:
CREATE INDEX ip2c_idx ON default(_type,IPFrom,IPTo) WHERE _type = "ip2c"
Then run this query:
SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPTo >= 3627730286 LIMIT 1
It took ~ 9s
If remove IPTo >= 3627730286:
SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 LIMIT 1
It took only 14ms
What need i do to optimize the first query?

Can you post the EXPLAIN for both queries.

EXPLAIN SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPTo >= 3627730286 LIMIT 1

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "ip2c_idx",
              "index_id": "a3c9181d02b34c43",
              "keyspace": "default",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"ip2c\"",
                      "successor(3627730286)"
                    ],
                    "Inclusion": 0,
                    "Low": [
                      "\"ip2c\"",
                      "null",
                      "3627730286"
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch",
              "keyspace": "default",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "((((`default`.`_type`) = \"ip2c\") and ((`default`.`IPFrom`) <= 3627730286)) and (3627730286 <= (`default`.`IPTo`)))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "(`default`.`Ctry`)"
                      }
                    ]
                  },
                  {
                    "#operator": "FinalProject"
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "1"
        }
      ]
    },
    "text": "SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPTo >= 3627730286 LIMIT 1"
  }
]

EXPLAIN SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 LIMIT 1

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "ip2c_idx",
              "index_id": "a3c9181d02b34c43",
              "keyspace": "default",
              "limit": "1",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"ip2c\"",
                      "successor(3627730286)"
                    ],
                    "Inclusion": 0,
                    "Low": [
                      "\"ip2c\"",
                      "null"
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch",
              "keyspace": "default",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((`default`.`_type`) = \"ip2c\") and ((`default`.`IPFrom`) <= 3627730286))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "(`default`.`Ctry`)"
                      }
                    ]
                  },
                  {
                    "#operator": "FinalProject"
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "1"
        }
      ]
    },
    "text": "SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 LIMIT 1"
  }
]

The second query is able to push the LIMIT down to the index. You can add IPFrom >= 0 to the first query.

If you also want to fetch more than one or a few documents, you can add Ctry as the last key in the index, which will make it a covering index.

Thanks for quick reply.
Try to add IPFrom > 0 to query but it still took ~9s. Explain:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "ip2c_idx",
              "index_id": "a3c9181d02b34c43",
              "keyspace": "default",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"ip2c\"",
                      "successor(3627730286)"
                    ],
                    "Inclusion": 0,
                    "Low": [
                      "\"ip2c\"",
                      "0",
                      "3627730286"
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch",
              "keyspace": "default",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((((`default`.`_type`) = \"ip2c\") and ((`default`.`IPFrom`) <= 3627730286)) and (0 < (`default`.`IPFrom`))) and (3627730286 <= (`default`.`IPTo`)))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "(`default`.`Ctry`)"
                      }
                    ]
                  },
                  {
                    "#operator": "FinalProject"
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "1"
        }
      ]
    },
    "text": "SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPFrom > 0 AND IPTo >= 3627730286 LIMIT 1"
  }
]

Sorry about that. What is the maximum value of IPTo?

Please add IPTo <= max value of IPTo

Try both:
SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPFrom > 0 AND IPTo >= 3627730286 and IPTo <= 4294967295 LIMIT 1
and
SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPTo >= 3627730286 and IPTo <= 4294967295 LIMIT 1
It’s still the same above :frowning: . Explain:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "ip2c_idx",
              "index_id": "a3c9181d02b34c43",
              "keyspace": "default",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"ip2c\"",
                      "3627730286",
                      "4294967295"
                    ],
                    "Inclusion": 2,
                    "Low": [
                      "\"ip2c\"",
                      "0",
                      "3627730286"
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch",
              "keyspace": "default",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "((((((`default`.`_type`) = \"ip2c\") and ((`default`.`IPFrom`) <= 3627730286)) and (0 < (`default`.`IPFrom`))) and (3627730286 <= (`default`.`IPTo`))) and ((`default`.`IPTo`) <= 4294967295))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "(`default`.`Ctry`)"
                      }
                    ]
                  },
                  {
                    "#operator": "FinalProject"
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "1"
        }
      ]
    },
    "text": "SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPFrom > 0 AND IPTo >= 3627730286 and IPTo <= 4294967295 LIMIT 1"
  }
]

Are you using 4.5.1?

@vsr1, how do we push this LIMIT down to the indexer? Do we have to use a subquery?

Yes, i’m using CB Server 4.5.1.

The limit can be pushed to indexer when the predicates are exactly pushed to indexer and applying predicates by N1QL will not eliminate any items.

The IPFrom and IPTo are range predicates and will not qualifying pushing LIMIT.
You can use covering index to make it faster.
CREATE INDEX ip2c_idx ON default(_type,IPFrom,IPTo,Ctry) WHERE _type = “ip2c”;

Use covering index, it took >3s to run the query. It seems not fast enough :frowning:

Hi @vsr1,

This query is pushing down LIMIT. What is the difference?

SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 LIMIT 1

Hi @geraldss,

_type=‘ip2c’ AND IPFrom <= 3627730286 has leading key is equal and second leading key is range (both low and high present). N1QL will not eliminate any items.

_type=‘ip2c’ AND IPFrom <= 3627730286 AND IPTo >= 3627730286 has leading key is equal and second leading key is range and third leading key is range. Indexer will generate false positives and N1QL can eliminate items and not able to push.

3sec too long for LIMIT 1.
Could you use the following index and also post the EXPLAIN

CREATE INDEX ip2c_idx ON default(IPFrom,IPTo,Ctry) WHERE _type = “ip2c”;

Explain:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "covers": [
                "cover ((`default`.`_type`))",
                "cover ((`default`.`IPFrom`))",
                "cover ((`default`.`IPTo`))",
                "cover ((`default`.`Ctry`))",
                "cover ((meta(`default`).`id`))"
              ],
              "filter_covers": {
                "cover ((`default`.`_type`))": "ip2c"
              },
              "index": "ip2c_idx",
              "index_id": "21b9bddf92d0daf2",
              "keyspace": "default",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"ip2c\"",
                      "successor(3627730286)"
                    ],
                    "Inclusion": 0,
                    "Low": [
                      "\"ip2c\"",
                      "null",
                      "3627730286"
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((cover ((`default`.`_type`)) = \"ip2c\") and (cover ((`default`.`IPFrom`)) <= 3627730286)) and (3627730286 <= cover ((`default`.`IPTo`))))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "cover ((`default`.`Ctry`))"
                      }
                    ]
                  },
                  {
                    "#operator": "FinalProject"
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "1"
        }
      ]
    },
    "text": "SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPTo >= 3627730286 LIMIT 1"
  }
]

Could you please post the output of completed_requests for the Statement that matches your query only.

select * from system:completed_requests;

And how long does this query take with this index?

This is output of:
select * from system:completed_requests;

{
    "completed_requests": {
      "ClientContextID": "b50b35eb-0402-4aca-b23a-aa3b971a4f77",
      "ElapsedTime": "3.234391509s",
      "ErrorCount": 0,
      "PhaseCounts": {
        "IndexScan": 161922
      },
      "PhaseOperators": {
        "IndexScan": 1
      },
      "RequestId": "d00f4cec-d1de-4d27-bc81-786c83d692ee",
      "ResultCount": 1,
      "ResultSize": 36,
      "ServiceTime": "3.234343553s",
      "State": "completed",
      "Statement": "SELECT Ctry FROM default WHERE _type='ip2c' AND IPFrom <= 3627730286 AND IPTo >= 3627730286 LIMIT 1",
      "Time": "2016-11-02 01:26:49.366301201 +0000 UTC"
    }

Total documents is: 165434
Documents match IPFrom <= 3627730286: 161922
Documents match IPTo >= 3627730286: 3513
Query results(without LIMIT): 1
I don’t know why IndexScan = 161922

try one of the following options:
Option 1)
CREATE INDEX ip2c_idx ON default(IPTo, IPFrom, Ctry) WHERE _type = “ip2c”;
SELECT Ctry FROM default WHERE _type=‘ip2c’ AND IPFrom <= 3627730286 AND IPTo >= 3627730286 LIMIT 1

OPTION 2)

CREATE INDEX ip2c_idx ON default(IPFrom,-IPTo,Ctry) WHERE _type = “ip2c”;
SELECT Ctry FROM default WHERE _type=‘ip2c’ AND IPFrom <= 3627730286 AND -IPTo <= -3627730286 LIMIT 1;

The first option seems work, the query took ~150ms, but i think it doesn’t solve the problem. Suppose:
n1 is total documents that matches: IPFrom <= IP (IP is some value that need to be query)
n2 is total documents that matches: IPTo >= IP
idx1: default(IPTo, IPFrom, Ctry)
idx2: default(IPFrom, IPTo, Ctry)

  • If n1 >> n2 (significantly greater): query with idx1 will fast, idx2 will slow
  • n1 << n2: query with idx1 will slow, idx2 will fast
  • n1 ~ n2: the query will not too slow but still slow (about ~1s)

If create both idx1 and idx2, the CB server will select random index to run the query, I have use index hint, ex:
SELECT Ctry FROM default USE INDEX(idx1 USING GSI) WHERE _type='ip2c' AND IPTo >= 4294967295 AND IPFrom <=4294967295