I’m experiencing a persistent issue where pull replication stops working after 5 minutes of the device sitting idle. Our device is currently using CouchbaseLite 2.8.1 for Android, but I’ve been able to replicate this behavior in a test app that is using CouchbaseLite 2.7.0 for Android as well.
Here’s a detailed breakdown of the problem, and what I’ve discovered through testing over the past few days:
- I open the same workbook on two devices, and let them both sit idle for 5+ minutes.
- I then write on one device (A), but the other device (B) does not receive the change.
- If I write on the device (B) that did not receive the remote change, it will then pull in the change from (A).
One conclusion I’ve reached is that writing to the device after it’s been sitting idle for 5+ minutes will “wake up” the replicator, causing it to pull in any remote changes that are pending. Furthermore, I did a test where I wrote a document to the database every 3 minutes, and the problem went away – pulling in remote changes did not stop after 5 minutes. So this suggests that the replicator goes into some kind of dormant state after that time?
The replicator itself is a continuous push-and-pull type. During testing, I also set logging to verbose, but I see no change in the replicator status after the 5+ minutes. i.e. After opening the database and configuring the replicator, it correctly goes into the idle state. Then after the 5 minutes, I write on the other device, but nothing changes on the device I’m expecting the remote change to come into – there is no change of the replicator state to busy, and nothing to suggest it is pulling in data. That is, until I do a write to that database, which then touches the replicator and “wakes it up”.
Lastly, there is an iOS version of the app (though it’s currently still on CouchbaseLite 1.4), and pull replication works flawlessly on that platform. For legacy reasons, use of web sockets is explicitly turned off on iOS, but I turned them on so it uses web sockets a well, because I began wondering whether the issue is somehow specific to web sockets and its connection to Sync Gateway. However, pull replication continued to work just fine on iOS.
Regarding web sockets and Sync Gateway, I did come across some information that states that a replicator sends a heartbeat every 5 minutes to keep the connection open, and that any load balancers will need to have a higher value configured for its timeout. I can confirm that our load balancer’s timeout is set to 7200 minutes, so that doesn’t seem to be the issue (though it is curious that the heartbeat time coincides with the time I observe pull replication to stop working).
I’m still not sure exactly what the source of this problem is, whether it’s a Sync Gateway configuration issue, or an issue specific to the Android SDK (2.x)? I’m hoping someone might have some insight or ideas about this issue.
Thanks!