I have an issue in an HCL XPages application that performs rather slow. I am not sure it is related to Couchbase at all - but I get metrics like this written to the console:
[012918:004947-00007FCE84CB3700] 16-04-2024 12:29:04 HTTP JVM: [cb-events] INFO com.couchbase.metrics - [com.couchbase.metrics][LatencyMetricsAggregatedEvent][600s] Aggregated Latency Metrics: {"operations":{"query":{"query":{"total_count":6363,"percentiles_us":{"50.0":287309.823,"90.0":692060.159,"99.0":1224736.767,
[012918:004947-00007FCE84CB3700] 16-04-2024 12:29:04 HTTP JVM: 99.9":2105540.607,"100.0":3137339.391}}},"kv":{"get":{"total_count":3318514,"percentiles_us":{"50.0":210763.775,"90.0":532676.607,"99.0":721420.287,"99.9":801112.063,"100.0":2264924.159}},"upsert":{"total_count":4,"percentiles_us":{"50.0":528.383,"90.0":1
[012918:004947-00007FCE84CB3700] 16-04-2024 12:29:04 HTTP JVM: 22.303,"99.0":1122.303,"99.9":1122.303,"100.0":1122.303}}}},"meta":{"emit_interval_s":600}}
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: [cb-events] WARN com.couchbase.tracing - [com.couchbase.tracing][OverThresholdRequestsRecordedEvent][120s] Requests over Threshold found: {"query":{"top_requests":[{"operation_name":"query","last_dispatch_duration_us":903473,"last_remote_socket":"db1.xxxx
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: .dk:8093","last_local_socket":"10.42.208.10:21620","total_dispatch_duration_us":903473,"timeout_ms":100000,"total_duration_us":1847052},{"operation_name":"query","last_dispatch_duration_us":1078288,"last_remote_socket":"db2.xxxx.dk:8093","last_loca
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: _socket":"10.42.208.10:4464","total_dispatch_duration_us":1078288,"timeout_ms":100000,"total_duration_us":1767941},{"operation_name":"query","last_dispatch_duration_us":584634,"last_remote_socket":"db2.xxxx.dk:8093","last_local_socket":"10.42.208.10:4
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: 88","total_dispatch_duration_us":584634,"timeout_ms":100000,"total_duration_us":1523604},{"operation_name":"query","last_dispatch_duration_us":1494855,"last_remote_socket":"db1.xxxx.dk:8093","last_local_socket":"10.42.208.10:22270","total_dispatch_dur
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: tion_us":1494855,"timeout_ms":100000,"total_duration_us":1494962},{"operation_name":"query","last_dispatch_duration_us":625511,"last_remote_socket":"db2.xxxx.dk:8093","last_local_socket":"10.42.208.10:4892","total_dispatch_duration_us":625511,"timeout
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: ms":100000,"total_duration_us":1429927},{"operation_name":"query","last_dispatch_duration_us":1408742,"last_remote_socket":"db1.xxxx.dk:8093","last_local_socket":"10.42.208.10:21430","total_dispatch_duration_us":1408742,"timeout_ms":100000,"total_dura
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: ion_us":1408841},{"operation_name":"query","last_dispatch_duration_us":1362739,"last_remote_socket":"db2.xxxx.dk:8093","last_local_socket":"10.42.208.10:4824","total_dispatch_duration_us":1362739,"timeout_ms":100000,"total_duration_us":1362839},{"oper
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: tion_name":"query","last_dispatch_duration_us":1060293,"last_remote_socket":"db2.xxxx.dk:8093","last_local_socket":"10.42.208.10:4554","total_dispatch_duration_us":1060293,"timeout_ms":100000,"total_duration_us":1060430},{"operation_name":"query","las
[012918:004955-00007FCE84CB3700] 16-04-2024 12:29:09 HTTP JVM: _dispatch_duration_us":145915,"last_remote_socket":"db2.xxxx.dk:8093","last_local_socket":"10.42.208.10:4230","total_dispatch_duration_us":145915,"timeout_ms":100000,"total_duration_us":1055297}],"total_count":9}}
If I connect to the two database servers directly they seem “happy” with around 20% CPU usage and 80% memory usage. Queries are quick.
The application itself is slow - and I am just wondering if there is a bottleneck in the connection from the application server to the database servers… Normally, everything runs fast - however, this morning we have more users than normally - and then everything seems to hang.
Issues we have looked into now was number of threads on the web server where the application runs) - could there be similar concerns related to the SDK?
I am on 7.2.4 CE (on CentOS7) and using Java SDK 3.5.3. The server is running Java 1.8
Thanks in advance for any insights!
/John