Looking for a Consultant/Help Troubleshooting Setup

Hello,
I’m a mobile developer that was assigned to integrate Couchbase for real time syncing functionality.

When I was building our app in our development/staging environment, everything was great. Once it was put in production and it started getting really utilized, we started to run into problems that haven’t ever happened in our staging environments:

  • We get “Write Commit Failure” messages most mornings between 6:05AM - 6:45AM.
  • Possibly connected to the issue above, the [bucket_name]/_design/sync_gateway view index build process gets stuck. It appears that this causes our mobile users to not be able to log into the app because behind the scenes, it returns an error saying something like: a process is in progress and needs to finish first.

I’m not a database expert, and although I’ve went through the documentation, I’m still at a loss troubleshooting this issue. Is anyone out there interested in a paid opportunity to solve this issue? Here’s some info about our setup:

  • 1 data bucket.
  • 3x Couchbase Server 4.5.1 Community Edition instances running in a cluster.
  • 1 replica.
  • 2x Sync Gateway 1.5.1 Community Edition instances.
  • The data we store is all short-lived. Documents should be deleted after being processed (which should be done within seconds after being stored), or after 12 hours.

If you’re interested, please let me know by replying here. This is a paid opportunity.

Can you let us know CBL/SG versions and platforms used? What’s in the logs during the 6:05 - 6:45 time?

@daniel.petersen:
Couchbase Server: Version 4.5.1 Community Edition on Windows servers.
Sync Gateway: Version 1.5.1 Community Edition on Windows servers.
Couchbase Lite: Version 1.4.1 on iOS and Android.

From the Log tab, here’s the logs for the Write Commit Failures:

Event Module Code Server Node Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 5 times) menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:19:30 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:18:16 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.3 6:12:50 AM Tue Jan 29, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:08:25 AM Tue Jan 29, 2019

Here’s an example from yesterday:

Event Module Code Server Node Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.3 6:38:47 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 18 times) menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:38:30 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:37:31 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.3 6:36:59 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:35:19 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.3. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.3 6:33:35 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. (repeated 1 times) menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:32:30 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:31:25 AM Mon Jan 28, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node xxx.xx.x.1. menelaus_web_alerts_srv 000 ns_1@xxx.xx.x.1 6:26:40 AM Mon Jan 28, 2019

Is there some daily process that is happening during that morning time? If it is working fine at all other times I would check your environment to find out what other things are going on at that time.

Here’s more data in case it is helpful:


I’ll check with the server admin to see if there could be something happening at that time that’s causing a conflict. :+1:

I got with the admin. It sounds like the time that the issue comes up is our peak time in terms of processing. One thing I’m testing now is adding 3 GB of RAM to the Index RAM Quota to see if that helps. It was previously 1 GB of RAM.

After updating the Index RAM Quota from 1GB to 4GB, the issue still happens in the same way it had previously. Here’s the latest log entry:

Event Module Code Server Node Time
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node x.x.x.1. (repeated 4 times) menelaus_web_alerts_srv 000 ns_1@x.x.x.1 6:31:30 AM Wed Jan 30, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node x.x.x.1. menelaus_web_alerts_srv 000 ns_1@x.x.x.1 6:30:37 AM Wed Jan 30, 2019
Write Commit Failure. Disk write failed for item in Bucket “data_bucket” on node x.x.x.3. menelaus_web_alerts_srv 000 ns_1@x.x.x.3 6:17:08 AM Wed Jan 30, 2019