My bucket is about 40K docs, but the target size is defined in the query using the size correct?
Well, my question was more about how many document hits your query obtains. Like I mentioned earlier - queries that obtain more document hits tend to run longer.
If you would like to index the geo fields as numeric data and move the geo-spatial logic to your application - I’d recommend using a GSI index for the query.
I have a lot of matches when I zoom out of the map, but I thought the size limited how many hits it should make
I’d recommend to try and keep the query limited to match a limited number of documents as opposed to all or a majority of the documents.
If (1) is not an option, and if you’re not doing this already - I’d recommend distributing the index over multiple nodes hosting the search service - to distribute the query load across a cluster of nodes as opposed to a single node…
The size parameter of the query is more of a filter/limit setting for the search service to truncate the number of results that the query matches.
Additionally, I’d adapt the index definition you’ve shared a bit …
“edit” the default type mapping and check “only index specified fields”
“edit” the geo field and uncheck “store”, “include in _all field” options
This would reduce the size of the search index that queries will have to work with.
A GSI (Global Secondary Index) is a different type of index that Couchbase supports that can be accessed from the query workbench. There’s detailed documentation for it here … Index Service | Couchbase Docs
1- Can you clarify on how to limit the number of documents matched not majority, since size is only effect for results, like you’ve described?
2- is not an option, i think a simple GSI index on the geo fields can do the trick!
PS: The edits don’t improve the performance, i wonder how the geo query works internally, because if it was using a basic GSI index on the geo fields it can be faster,
The “size”/“limit” setting is the only way to limit the number of documents returned by a query. This limit is however NOT necessarily the actual number of document hits the query would work on.
For example - setting geo-distance to 10km from a certain {lat, lon} could match 100 documents, and if you specify "size": 2 for the query, only 2 of those 100 documents are returned by the query.
So keeping the query narrow/precise would reduce the number of document hits that become eligible for the search criteria - and would keep the query latency lower.
Geo-spatial support is available only within search indexes. If you choose to use a GSI index, you’ll have to add the necessary logic within your application.
I’ve been reading a lot, and it clearly shows that the GSI is much faster than the internal spacial queries,
Now;
I’ve put together the spacial logic in the query,
Let’s say I have a Point A [43.65, -79.44]
I create a SELECT using the geo formula against this Point A, which generates the distance between Point A and the documents in DB, as distance in KM
This query works
SELECT
d.env,
d.geo,
d.city,
(ACOS(SIN( RADIANS(43.65)) * SIN(RADIANS(d.geo.lat)) + COS( RADIANS(43.65)) * COS(RADIANS(d.geo.lat)) * COS(RADIANS(d.geo.lon) - RADIANS(-79.44))) * 6371) AS distance
FROM `dev` d
WHERE d.geo.lat IS NOT NULL AND d.geo.lat IS NOT MISSING
AND (ACOS(SIN( RADIANS(43.65)) * SIN(RADIANS(d.geo.lat)) + COS( RADIANS(43.65)) * COS(RADIANS(d.geo.lat)) * COS(RADIANS(d.geo.lon) - RADIANS(-79.44))) * 6371) <= 50000
ORDER BY distance DESC
LIMIT 20000;
**
This second one doesn’t work, I tried to make it cleaner like this but it fails,
**
SELECT
d.env,
d.geo,
d.city,
(ACOS(SIN( RADIANS(43.65)) * SIN(RADIANS(d.geo.lat)) + COS( RADIANS(43.65)) * COS(RADIANS(d.geo.lat)) * COS(RADIANS(d.geo.lon) - RADIANS(-79.44))) * 6371) AS distance
FROM `dev` d
WHERE d.geo.lat IS NOT NULL AND d.geo.lat IS NOT MISSING
AND distance <= 50000
ORDER BY distance DESC
LIMIT 20000;
@vsr1 - How can you make the second query work, the first query is working but I am duplicating the geo formula, both in the SELECT clause and in the WHERE AND ... <= 50000
As you can see in the second query am trying to use distance but it breaks it and fails.