What do you hope to gain from determining the average lat/lon of the documents returned by your search request?
As the cluster marker needs a lat/lon to display on the map, taking the average of the documents represented by the cluster achieves this. Alternatively, we could simply take the center point of the grid rect, but this is a bit poorer UX, as it ends up creating a distinct grid pattern of cluster markers.
I take it that this calculation is made within your application as Couchbase FTS does not support arithmetic operations over document content.
Yes, I’m currently performing this math in the server application. The document geo
is stored in the FTS index. I’ve experimented with various limits on the document rows returned to perform the average on. I didn’t see noticeable difference in performance between a limit of 10, 100, or 1000, or even between taking a simple centroid of the grid.
Upon further testing, this algorithm works well at high zoom levels, taking milliseconds even with 10s of millions of documents. But the searches become longer running at low zoom levels, taking multiple seconds starting at the country level. Continent, or entire world map view are extremely slow, timing out with a large number of documents.
This may have to do with needing to load nearly the entire FTS index into memory in order to search a large portion of the map, which, being done in parallel, is likely causing contention between accessing different parts of the index. Maybe there’s a better algorithm that performs better at these low zoom levels. It’s also possible we just don’t support displaying markers at very low zoom levels.
For 30M documents, the FTS index disk size is 5.35GB. I’ve allocated 16GB to the search service, but I’m still testing with a single node local server. There may be optimizations to shrink the size of the index and that might make a difference. Perhaps it could be more performant to not store some of the properties and rather rely on the data service to get them instead, since they’re only needed when rendering non-clustered markers.
Sharing your FTS index definition and some sample queries will help us help you better when it comes to questions on index/search performance.
Here's the FTS index I'm currently using:
{
"type": "fulltext-index",
"name": "lead_geo",
"uuid": "72c6391d703e8503",
"sourceType": "gocbcore",
"sourceName": "salesrabbit",
"sourceUUID": "e7e269b73b9eb60d5a60a74ca274abc2",
"planParams": {
"maxPartitionsPerPIndex": 1024,
"indexPartitions": 1
},
"params": {
"doc_config": {
"docid_prefix_delim": "",
"docid_regexp": "",
"mode": "type_field",
"type_field": "type"
},
"mapping": {
"analysis": {},
"default_analyzer": "standard",
"default_datetime_parser": "dateTimeOptional",
"default_field": "_all",
"default_mapping": {
"dynamic": true,
"enabled": false
},
"default_type": "_default",
"docvalues_dynamic": false,
"index_dynamic": false,
"store_dynamic": false,
"type_field": "_type",
"types": {
"lead": {
"dynamic": false,
"enabled": true,
"properties": {
"channels": {
"dynamic": false,
"enabled": true,
"properties": {
"orgUnit": {
"dynamic": false,
"enabled": true,
"fields": [
{
"index": true,
"name": "orgUnit",
"type": "text"
}
]
},
"users": {
"dynamic": false,
"enabled": true,
"fields": [
{
"index": true,
"name": "users",
"store": true,
"type": "text"
}
]
}
}
},
"geo": {
"dynamic": false,
"enabled": true,
"fields": [
{
"index": true,
"name": "geo",
"store": true,
"type": "geopoint"
}
]
},
"status": {
"dynamic": false,
"enabled": true,
"fields": [
{
"index": true,
"name": "status",
"store": true,
"type": "text"
}
]
}
}
}
}
},
"store": {
"indexType": "scorch",
"segmentVersion": 15
}
},
"sourceParams": {}
}
Here's a simplified version of the document model:
{
"type": "lead",
"geo": {
"lat": 40.4210126,
"lon": -111.8836315
},
"status": "status1",
"channels": {
"users": ["user1", "user2"],
"orgUnit": "ou1"
},
//...
}
Here's the query I'm currently using:
cluster.searchQuery(
"lead_geo",
GeoBoundingBoxQuery(grid.ulLon, grid.ulLat, grid.brLon, grid.brLat)
.field("geo"),
SearchOptions.searchOptions()
.fields("geo", "status", "channels.users")
.limit(10)
)
How can I modify this query to additionally filter on status
and/or channel.users
? For example:
WHERE status IN ("status2", "status3") AND (ARRAY_CONTAINS(channel.users, "user3") OR ARRAY_CONTAINS(channel.users, "user5"))
Is it possible to do this entirely with the FTS index? I have the properties indexed and stored, along with geo
.
Another question, is it possible to use a subdocument property as the type filter? For example:
{
"subDoc": {
"type": "docType"
}
}
using subDoc.type
as the type filter in the FTS query? I tried with this syntax, and while didn’t get an error, my test queries didn’t use the index like they did when I used a root property for type instead.