Looking for a Consultant/Help Troubleshooting Setup

ron · January 29, 2019, 7:16pm

Hello,
I’m a mobile developer that was assigned to integrate Couchbase for real time syncing functionality.

When I was building our app in our development/staging environment, everything was great. Once it was put in production and it started getting really utilized, we started to run into problems that haven’t ever happened in our staging environments:

We get “Write Commit Failure” messages most mornings between 6:05AM - 6:45AM.
Possibly connected to the issue above, the [bucket_name]/_design/sync_gateway view index build process gets stuck. It appears that this causes our mobile users to not be able to log into the app because behind the scenes, it returns an error saying something like: a process is in progress and needs to finish first.

I’m not a database expert, and although I’ve went through the documentation, I’m still at a loss troubleshooting this issue. Is anyone out there interested in a paid opportunity to solve this issue? Here’s some info about our setup:

1 data bucket.
3x Couchbase Server 4.5.1 Community Edition instances running in a cluster.
1 replica.
2x Sync Gateway 1.5.1 Community Edition instances.
The data we store is all short-lived. Documents should be deleted after being processed (which should be done within seconds after being stored), or after 12 hours.

If you’re interested, please let me know by replying here. This is a paid opportunity.

daniel.petersen · January 29, 2019, 7:53pm

Can you let us know CBL/SG versions and platforms used? What’s in the logs during the 6:05 - 6:45 time?

ron · January 29, 2019, 8:15pm

@daniel.petersen:
Couchbase Server: Version 4.5.1 Community Edition on Windows servers.
Sync Gateway: Version 1.5.1 Community Edition on Windows servers.
Couchbase Lite: Version 1.4.1 on iOS and Android.

From the Log tab, here’s the logs for the Write Commit Failures:

Event	Module Code	Server Node	Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 5 times)	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:19:30 AM Tue Jan 29, 2019
—	—	—	—
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:18:16 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.3	6:12:50 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:08:25 AM Tue Jan 29, 2019

Here’s an example from yesterday:

Event	Module Code	Server Node	Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.3	6:38:47 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 18 times)	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:38:30 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:37:31 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.3	6:36:59 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:35:19 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.3	6:33:35 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 1 times)	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:32:30 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:31:25 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1.	menelaus_web_alerts_srv 000	ns_1@xxx.xx.x.1	6:26:40 AM Mon Jan 28, 2019

daniel.petersen · January 29, 2019, 8:28pm

Is there some daily process that is happening during that morning time? If it is working fine at all other times I would check your environment to find out what other things are going on at that time.

ron · January 29, 2019, 8:29pm

Here’s more data in case it is helpful:

ron · January 29, 2019, 8:30pm

I’ll check with the server admin to see if there could be something happening at that time that’s causing a conflict.

ron · January 29, 2019, 11:22pm

I got with the admin. It sounds like the time that the issue comes up is our peak time in terms of processing. One thing I’m testing now is adding 3 GB of RAM to the Index RAM Quota to see if that helps. It was previously 1 GB of RAM.

ron · January 30, 2019, 3:58pm

After updating the Index RAM Quota from 1GB to 4GB, the issue still happens in the same way it had previously. Here’s the latest log entry:

Event	Module Code	Server Node	Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node x.x.x.1. (repeated 4 times)	menelaus_web_alerts_srv 000	ns_1@x.x.x.1	6:31:30 AM Wed Jan 30, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node x.x.x.1.	menelaus_web_alerts_srv 000	ns_1@x.x.x.1	6:30:37 AM Wed Jan 30, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node x.x.x.3.	menelaus_web_alerts_srv 000	ns_1@x.x.x.3	6:17:08 AM Wed Jan 30, 2019

Topic		Replies	Views
Build View Indexes and Write Commit Failures Couchbase Server	1	1072	September 14, 2018
Couchbase Lite Replicator Not Syncing after write Couchbase Lite android	14	1340	August 5, 2022
Initial write to couchbase server times out Couchbase Server	5	4357	March 23, 2015
Couchbase lite 2.0, DB23 Sync gateway 2.0 Beta 2, First impressions Couchbase Lite	4	1002	March 16, 2018
Some documents were not synced at initial sync. After trying 10x it worked without making any changes Couchbase Lite	20	6562	October 26, 2016

Looking for a Consultant/Help Troubleshooting Setup

Related topics