Hello,
I’m a mobile developer that was assigned to integrate Couchbase for real time syncing functionality.
When I was building our app in our development/staging environment, everything was great. Once it was put in production and it started getting really utilized, we started to run into problems that haven’t ever happened in our staging environments:
We get “Write Commit Failure” messages most mornings between 6:05AM - 6:45AM.
Possibly connected to the issue above, the [bucket_name]/_design/sync_gateway view index build process gets stuck. It appears that this causes our mobile users to not be able to log into the app because behind the scenes, it returns an error saying something like: a process is in progress and needs to finish first.
I’m not a database expert, and although I’ve went through the documentation, I’m still at a loss troubleshooting this issue. Is anyone out there interested in a paid opportunity to solve this issue? Here’s some info about our setup:
1 data bucket.
3x Couchbase Server 4.5.1 Community Edition instances running in a cluster.
1 replica.
2x Sync Gateway 1.5.1 Community Edition instances.
The data we store is all short-lived. Documents should be deleted after being processed (which should be done within seconds after being stored), or after 12 hours.
If you’re interested, please let me know by replying here. This is a paid opportunity.
@daniel.petersen:
Couchbase Server: Version 4.5.1 Community Edition on Windows servers.
Sync Gateway: Version 1.5.1 Community Edition on Windows servers.
Couchbase Lite: Version 1.4.1 on iOS and Android.
From the Log tab, here’s the logs for the Write Commit Failures:
Event
Module Code
Server Node
Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 5 times)
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:19:30 AM Tue Jan 29, 2019
—
—
—
—
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:18:16 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.3
6:12:50 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:08:25 AM Tue Jan 29, 2019
Here’s an example from yesterday:
Event
Module Code
Server Node
Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.3
6:38:47 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 18 times)
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:38:30 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:37:31 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.3
6:36:59 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:35:19 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.3
6:33:35 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 1 times)
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:32:30 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.
menelaus_web_alerts_srv 000
ns_1@xxx.xx.x.1
6:31:25 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.
Is there some daily process that is happening during that morning time? If it is working fine at all other times I would check your environment to find out what other things are going on at that time.
I got with the admin. It sounds like the time that the issue comes up is our peak time in terms of processing. One thing I’m testing now is adding 3 GB of RAM to the Index RAM Quota to see if that helps. It was previously 1 GB of RAM.