Data doesn't write to bucket defined in bootstrap

MrNobody · September 13, 2024, 1:52am

First off, I don’t know how to specific bucket in the REST API call to create a document, I follow the online doc and run following API call

curl -X POST “http://localhost:4984/my-db/” -H “accept: application/json” -H “Content-Type: application/json” -H “Authorization: Basic MYAUTH” -d “{"data":"my data"}”

returned

{“id”:“cfb4fb3ccba33a78b04edc3509b60636”,“ok”:true,“rev”:“1-21c3fa97d6140d81e9a66c0a536ffabc”}

Above command executed successfully, however I found the JSON doc in a bucket that’s not defined in the database default setting

_sync:dbconfig:my-db:default
{
“version”: “4-ceb22086b1d8d1512f858d0024dfa9a1”,
“sg_version”: “3.1.10@4-EE”,
“metadata_id”: “_default”,
“bucket”: “DataBucket-1”,
“name”: “my-db”,
…
}

from above config, default bucket is DataBucket-1, however the JSON doc created in DataBucket-3, and interestingly such behavior matches below 2 endpoint call

curl -X GET “http://localhost:4985/my-db/_config?include_runtime=true” -H “accept: application/json” -H “Authorization: Basic MYAUTH”

above returned DataBucket-3

curl -X GET “http://localhost:4985/my-db/_config” -H “accept: application/json” -H “Authorization: Basic MYAUTH”

above returned DataBucket-1

it’s a bit confusing that POST command to create document end up write to a random bucket identified at run time. Here are my questions

Is there any parameters in the POST request to create document to specify the bucket name in the keyspace
Why the bucket in DB default config doesn’t work (data didn’t go to DataBucket-1)
Is there any config (or API call) to set default bucket for the local couchbase sync gateway? Does keyspace support bucket name in it?
Why sync gateway chose DataBucket-3, not -1, -2 and -4 (all 4 are available)

bbrks · September 13, 2024, 11:59am

The root of all your questions is really why does Sync Gateway think my-db is defined in DataBucket-3 instead of DataBucket-1.

I suspect what’s happened is you have somehow got into the situation where your Couchbase Server buckets have multiple Sync Gateway databases with the same name and Sync Gateway is choosing the first one it finds.

Can you clarify what buckets you have, and what Sync Gateway databases you expect to have created on each?

If you don’t want Sync Gateway looking in other buckets, you can restrict the RBAC permissions used by Sync Gateway’s bootstrap user, or specify a set of bucket_credentials

MrNobody · September 13, 2024, 2:40pm

Thanks @bbrks
Could you please kindly clarify the concept of “Sync Gateway databases”? My understanding is the Sync Gateway is a gateway service and the Couchbase server is the persistence layer, so Couchbase server whose holding the buckets are the Sync Gateway databases

I am upgrading Sync Gateway to 3.1, so the minimal bootstrap defines multiple CB servers for all sync gateway instance

  "databases": {
    "my-db": {
      "server": "couchbase://cbserver01.myspace.com,couchbase://cbserver02.myspace.com,couchbase://cbserver03.myspace.com",
      "bucket": "DataBucket-1",
      "username": "gateway_user",
      "password": "XxXxXx",
      .....
   }
}

however in each DB server there are multiple buckets

cbserver01.myspace.com
       |-- DataBucket-1
       |-- DataBucket-2
       |-- DataBucket-3
       |-- DataBucket-4

Same bucket structure applies to the cbserver02, cbserver03 etc

it’s surprising to see data goes into the bucket not defined in the bootstrap, or is there a config I miss which may cause the issue?

bbrks · September 13, 2024, 4:56pm

Your bootstrap config will not define a bucket, instead only a Couchbase Cluster. This is defined in your startup configuration file where you run Sync Gateway.

Sync Gateway will automatically discover all of the accessible buckets and load any database configurations which are found in each bucket.
These database configuration documents are keyed like you’ve already identified (_sync:dbconfig:my-db:default), and in normal circumstances only ever refer to the same bucket the config document is found in.

It is however possible, if you’ve moved documents between buckets (e.g. with backup/restore or XDCR) to have database configuration documents that refer to a different bucket than the one the database configuration is stored in.

In this case, it’s expected that Sync Gateway will not load the database, because it detects there’s a mismatch. We fixed this in version 3.1.2

https://jira.issues.couchbase.com/issues/CBG-3292

What version are you upgrading from/to?

MrNobody · September 13, 2024, 7:07pm

Thanks @bbrks
Currently am upgrading to 3.1.10…

then how can I define/config the sync gateway to let document only write to designated bucket?

MrNobody · September 16, 2024, 2:39pm

@bbrks
I am upgrading from 2.8 to 3.1.10
Is there a way to designate bucket to a sync gateway, not only database?

bbrks · September 16, 2024, 5:17pm

Is there a way to designate bucket to a sync gateway, not only database?

I assume you already saw my note 2 posts ago about providing bucket_credentials to limit which buckets are being accessed?

If you’re upgrading from 2.8 all of these configuration changes were made in 3.0 so it might be helpful for you to review the 3.0 documentation.

You can disable persistent config and run in legacy config mode to retain your 2.8 behaviour, but there’s going to be limited support for new features when running in this mode. I wouldn’t recommend it for a long period of time, just maybe as a stepping stone to get you upgraded first, and then think about changing your configuration.

MrNobody · September 16, 2024, 5:30pm

@bbrks
Could you please clarify Sync Gateway databases? From my example there are multiple sync gateway instances accessing my-db, which is the Couchbase database.

One the other hand, from my 2nd post, there are a couple CB server for my-db, each CB server consists of 4 data bucket.

I am not sure where to find other config/info to associate speicifc databucket to sync gateway “database”

adamf · September 16, 2024, 6:05pm

A Sync Gateway database is the term used for the application definition by Sync Gateway. A database defines a backing data store in Couchbase (a bucket, and optionally a set of collections within that bucket).

The same Sync Gateway database can be running on multiple Sync Gateway instances (i.e. you can scale the database horizontally). Separately, the backing data store in Couchbase can also be distributed across multiple Couchbase nodes. These two layers of scaling are independent - see the diagram here for an example:

In 3.0 and later Sync Gateway stores database configurations in the default collection of the associated bucket. Sync Gateway is started with a bootstrap configuration that connects to the Couchbase cluster, and then looks for any databases defined in each of the buckets that the bootstrap user has access to.

As Ben suggests, the behaviour you’re seeing could be explained if you’ve got a Sync Gateway database config defined with the same name in multiple buckets that the bootstrap user has access to. You can check for _sync:dbconfig:my-db:default in DataBucket-1 and DataBucket-3 and see if it exists in multiple places.

The set of buckets that Sync Gateway attempts to access is based on the access of the user defined in the bootstrap configuration. So as Ben mentions, if your intention is to have a specific Sync Gateway cluster only communicating with a single bucket, one approach is to change the permissions of the bootstrap user to only grant them the “Sync Gateway” security role for the desired bucket.

MrNobody · September 16, 2024, 7:46pm

Thanks @bbrks & @adamf for the explanation, sorry I am still confused. is my-db from my case the Sync Gateway database, or is it something else?
the diagram from the link

Introduction | Couchbase Docs

shows “travel-sample” as the database for the 2 sync gateway nodes, and “travel-sample” are also bucket name in each couchbase server.

Is it a correct understanding that

“travel-sample” database for each sync gateway node is separate concept of “travel-sample” bucket in each Couchbase server, however the name just match
“travel-sample” database is the so-called sync gateway database, it has it’s own physical datastore other than Couchbase server, and in the diagram 2 sync gateway nodes are sharing/pushing data to the same “travel-sample” database
“travel-sample” database in the digram == “my-db” in my setup

MrNobody · September 17, 2024, 5:17pm

@bbrks @adamf

Regarding the bucket_credentials setting, I don’t recall creating password for the bucket, can I use the same username/password from database_credentials ?

   bucket_credentials: {
      {bucketname...}: {
         password: "string",
         username: "string",
         x509_cert_path: "string",
         x509_key_path: "string"
      }
   },

adamf · September 17, 2024, 5:33pm

Correct - these are two separate things with matching names in the example.

No, the Sync Gateway database doesn’t have it’s own physical data store. The Couchbase bucket is the backing storage - the Sync Gateway database is the logical collection of data, configuration, users, roles and more for the application, but the data storage is the backing Couchbase Server bucket.

Correct.

If you’re using the community edition of Couchbase Server you may not have the ability to define fine grained bucket security. If this is the case, you’ll want to remove the _sync:dbconfig:my-db:default files manually from the unwanted buckets.

MrNobody · September 17, 2024, 6:46pm

@adamf Thanks! that’s very helpful

Actually I am configuring the Sync Gateway with EE edition

Couchbase Sync Gateway/3.1.10(4;dba529a) EE

Would you mind further clarify the “credential” for bucket (whether can be the same as database credential)

adamf · September 17, 2024, 7:10pm

The username/password credentials defined in Sync Gateway’s bootstrap config (bucket_credentials) specify the Couchbase Server user that Sync Gateway uses to connect to Couchbase Server. This user is defined and managed on the Couchbase Server side, as described here:

Additional documentation on managing the user on Couchbase Server is available here:

Through that UI you can define the set of buckets that the user has the Sync Gateway role for.

MrNobody · September 19, 2024, 2:00pm

@adamf Thanks that’s helpful

I am working on upgrade to 3.1.10 for 2 sync gateway nodes to run inter replication to each other, in such case, each node should have separate backing CB data server, then I assume in each CB server as long as I created a CB user that limit the access to the DataBucket-1 (also in each server), the same bootstrap config can used for the 2 sync gateway node (the further specific config i.e. push / pull etc will rely on the REST API)

adamf · September 19, 2024, 9:15pm

Yes, if you’ve defined the same user in both CB clusters then you can use the same bootstrap config for each. All the details around setting up and managing Inter-Sync Gateway replication are covered in the documentation.

MrNobody · September 25, 2024, 2:07pm

Thanks @adamf & @bbrks
One last thing need your kind clarification, in the Bootstrap configuration documentation, the bucket_credential is part of long list of JSON document for all the configurable properties. And within the JSON document there is attribute specifically named “bootstrap”


   bootstrap: {
      // bucket_credential is a separate attribute, not in the bootstrap attribute
      ca_cert_path: "string",
      config_update_frequency: "10s",
      group_id: "default",
      password: "string",
      server: "string",
      server_tls_skip_verify: false,
      use_tls_server: true,
      username: "string",
      x509_cert_path: "string",
      x509_key_path: "string"
   }

Could you please kindly confirm whether the bucket_credentials setting is part of bootstrap setup, or we have to leverage the REST API to set it up.

The main concern is to have Sync Gateway write to the correct bucket when it starts up (boostrap), not by the following REST API call if bucket_credentials cannot configure during the bootstrap stage, which may cause data discrepancy.

bbrks · September 25, 2024, 2:41pm

bucket_credentials are defined in your config file at the same level as bootstrap.

See docs here: Bootstrap Configuration | Couchbase Docs

MrNobody · September 25, 2024, 2:49pm

@bbrks Thanks yes I also see that in the document, my question is whether the entire JSON document can be used for bootstrap configuration, or only the “boostrap” attribute and associated values can be used for bootstrap configuration, sorry I just need clarification about it.

bbrks · September 25, 2024, 2:56pm

I see. The (startup) configuration file is usually synonymous with bootstrap. All properties apply to bootstrapping databases from server

Topic		Replies	Views
Try to understand the difference between Database and Bucket from sync gateway perspective Sync Gateway	4	78	September 12, 2024
Sync Gateway 1.3 API docs problem Sync Gateway mobile	2	1785	August 11, 2016
Can't Get data on My Bucket Sync Gateway connections	9	1715	February 23, 2017
Adding a document to existing bucket via REST API Couchbase Server	0	1040	July 3, 2018
Can't create document with POST request Sync Gateway	1	2206	April 4, 2016

Related topics