Eventing timer function takes long time to get triggered

cmaster11 · September 3, 2018, 4:18am

Hi, I’m testing the createTimer function with a similar code:

function OnUpdate(doc, meta) {

    log(doc, meta);

    const now = new Date();
    const executeAtDate = new Date(now.getTime() + 2000);

    log('Creating timer', executeAtDate);

    // Setup the timer
    createTimer(onTimer, executeAtDate, meta.id, {id: meta.id});
}

function onTimer(context) {
    log('Running timer', context.id);

    const doc = bucket_source[context.id];

    // Simple set
    doc.processedSimple = true;

    bucket_target[context.id] = doc;

    log('Set document flags', context.id);
}

By inspecting the function logs (in the @eventing dir), I can see that the first block gets invoked correctly on every document update, while the second block takes some times even more than 120 seconds to be invoked at least once.

I can see that the timer is planned:

2018-09-03T04:15:09.037+00:00 [INFO] {"docType":"eventing_integration_test_model","id":"DbModelEventingFunctionTestTimer_1535948053460","createdAt":1535948053460,"updatedAt":1535948053460,"counter":6,"processedSimple":false,"response":null} {"cas":1535948103505477600,"id":"wdm_eventing_integration_test_model_DbModelEventingFunctionTestTimer_1535948053460","expiration":0,"flags":33554432,"vb":622,"seq":7} 
2018-09-03T04:15:09.037+00:00 [INFO] "Creating timer" "2018-09-03T04:15:11.036Z"

Timer than takes a long time to start:

2018-09-03T04:16:54.201+00:00 [INFO] "Running timer" "wdm_eventing_integration_test_model_DbModelEventingFunctionTestTimer_1535948053460" 
2018-09-03T04:16:54.205+00:00 [INFO] "Set document flags" "wdm_eventing_integration_test_model_DbModelEventingFunctionTestTimer_1535948053460"

I’m running Couchbase 6.0.0-beta in Docker (Windows host), with:

Processor Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz, 2904 Mhz, 4 Core(s), 8 Logical Processor(s), 4 logical processors made available to Docker.
~1GB of RAM associated to the eventing service.

I’m trying to figure out if my setup is too weak for the eventing service to operate, or if there’s any tuning I can perform on the function settings.

Do you have any clue what could be going on?

Thanks,
Alberto

venkat · September 4, 2018, 8:33am

Can you give us an idea of how many timers are getting created at the same timestamp? And also how big individual nodes and the overall cluster is? Also, doc size(both key and value) and mutation rates?

cmaster11 · September 4, 2018, 8:48am

I’m running the whole cluster in Docker, so single node, with the specs listed above. Also, ~256MB of index RAM for each bucket (no indexes created, in any case).

I’m running a simple test though, which means:

1 timer function
A mutation happens every 5 seconds, where a counter value gets increased with sub-doc operations.
Key size ~85 chars
Value size ~200 bytes

It looks like quite a basic and simple scenario, to have problems with it

asingh · September 4, 2018, 9:18am

@cmaster11 could you please upload cbcollect from your setup? Details to collect it from UI available here - Couchbase Technical Support

cmaster11 · September 4, 2018, 10:29am

Sure! This is a record of when the timer didn’t even got triggered once in 120 seconds.

https://s3.amazonaws.com/cb-customers/Alberto+Marchetti+-+cmaster11/collectinfo-2018-09-04T101537-ns_1%40127.0.0.1.zip

asingh · September 4, 2018, 11:29am

Thanks for the cbcollect.

Function was undeployed when cbcollect was taken, stats from rest endpoint was missing - would have been useful if stats were around. Observation based on stats that gets logged within eventing.log:

6 DCP Mutations(doc creates/updates) were seen by Eventing. Can’t confirm if they are from same document or different documents.
5 timers were sent from eventing-consumer to eventing-producer, not sure how many got created and stored in metadata bucket.
None of timers have fired yet.

{"app_worker_setting_events_lost":0,"bucket_op_exception_count":0,"checkpoint_failure_count":0,"dcp_events_lost":0,"debugger_events_lost":0,"delete_events_lost":0,"mutation_events_lost":0,"n1ql_op_exception_count":0,"timeout_count":0,"timer_events_lost":0,"timestamp":"2018-09-04T10:11:58Z","v8worker_events_lost":0} execution stats: {"agg_queue_memory":267,"agg_queue_size":0,"dcp_delete_msg_counter":0,"dcp_mutation_msg_counter":6,"enqueued_dcp_delete_msg_counter":0,"enqueued_dcp_mutation_msg_counter":6,"enqueued_timer_msg_counter":0,"feedback_queue_size":1,"lcb_retry_failure":0,"messages_parsed":46,"on_delete_failure":0,"on_delete_success":0,"on_update_failure":0,"on_update_success":6,"timer_create_failure":0,"timer_msg_counter":0,"timer_responses_sent":5,"timestamp":"2018-09-04T10:11:58Z","uv_try_write_failure_counter":0}

Questions:

(a) When you say timers are taking more than 120s to fire - are you including the time it takes function to bootstrap as well[1]? If yes, you could let bootstrap finish and then trigger events within source bucket to create timers. Would be useful to know how much delay do you see in that case.
(b) If suggestion in (a) doesn’t solve your problem, cbcollect from running system with function(s) in deployed state would be useful for debugging.

QE folks here tried replicating the issue with supplied function JS code, but they weren’t able to see delay in timer execution beyond 15s.

[1] In Beta build, we waited for bootstrap to wrap before firing timers.

Siri · September 4, 2018, 11:54am

It’s also possible Alberto is running into one of the bugs that were fixed since Beta. In particular the ones with floor calculation. @venkat is there any possibility of sharing a beta refresh build?

cmaster11 · September 4, 2018, 12:26pm

Thanks for the answer.

I’m including the time the function takes to bootstrap.

I ran the tests again, and got this scenarios (and collected info with deployed functions):

I ran mutations while function was boostrapping, timer got triggered: https://s3.amazonaws.com/cb-customers/Alberto+Marchetti/collectinfo-2018-09-04T120325-ns_1%40127.0.0.1.zip
I ran mutations while function was boostrapping, timer did not run (in 200 sec) https://s3.amazonaws.com/cb-customers/Alberto+Marchetti/collectinfo-2018-09-04T121351-ns_1%40127.0.0.1.zip
I waited for function to bootstrap, timer got triggered immediately (SUCCESS!)
I waited for function to bootstrap, timer did NOT get triggered: https://s3.amazonaws.com/cb-customers/Alberto+Marchetti/collectinfo-2018-09-04T125549-ns_1%40127.0.0.1.zip
Same cluster as the line above, now if I edit a model by hand, it triggers the whole process with an average of 20 seconds between first function running and actual timer getting triggered (it’s scheduled to run in 2 seconds).

Thanks a lot

asingh · September 4, 2018, 5:23pm

Hi Alberto,

At 2018-09-04T12:50:37.938 db_eventing_integration_test_testFn_timer_timerFunction finished bootstrap

2018-09-04T12:50:37.938+00:00 [Info] SuperSupervisor::HandleSupCmdMsg [1] App: db_eventing_integration_test_testFn_timer_timerFunction_1536065408579 added to deployed apps map

From stats:

  "DCP_MUTATION": 22 - Same doc got created/updated multiple times
  "TIMER_RESPONSES_RECEIVED": 22 - 22 times Javascript code within OnUpdate asked for timers to be created. Given they were with same reference and timer callback function, subsequent timer request would overwrite previous one.
  "TIMER_EVENTS": 1 - One timer event fired off.

Between 2018-09-04T12:55:20.200 & 2018-09-04T12:56:20.20 - timer had fired

2018-09-04T12:55:20.200+00:00 [Info] Consumer::processEvents [worker_db_eventing_integration_test_testFn_timer_timerFunction_1536065408579_1:40117:633] DCP events: DCP_STREAMREQ:341 DCP_SNAPSHOT:21 DCP_MUTATION:21 V8 events: THR_COUNT:1 SOURCE_MAP:1 EXECUTION_STATS:58 LATENCY_STATS:58 LCB_EXCEPTION_STATS:58 LOG_LEVEL:3 THR_MAP:1 V8_INIT:1 V8_LOAD:1 HANDLER_CODE:1 FAILURE_STATS:58 Timer events: Doc: 0, vbs owned len: 341 vbs owned: [342-682]


2018-09-04T12:56:20.200+00:00 [Info] Consumer::processEvents [worker_db_eventing_integration_test_testFn_timer_timerFunction_1536065408579_1:40117:633] DCP events: DCP_STREAMREQ:341 DCP_SNAPSHOT:22 DCP_MUTATION:22 V8 events: LATENCY_STATS:70 THR_COUNT:1 SOURCE_MAP:1 EXECUTION_STATS:70 V8_LOAD:1 HANDLER_CODE:1 FAILURE_STATS:70 LCB_EXCEPTION_STATS:70 LOG_LEVEL:3 THR_MAP:1 V8_INIT:1 Timer events: Doc: 1, vbs owned len: 341 vbs owned: [342-682]

Tue Sep 4 12:55:50 UTC 2018 is when cbcollect was started, probably in the middle of cbcollect run - timer was fired. I see there are delays in timer execution on your setup, not sure if its because of some bug related to floor calculation logic which we may’ve recently fixed. Will check with Venkat if we share an updated stable build with you.

Thanks,
Abhishek

cmaster11 · September 4, 2018, 5:40pm

Between 2018-09-04T12:55:20.200 & 2018-09-04T12:56:20.20 - timer had fired

Oh! I see, my bad, it was definitely not capture by my test suite (I have sensible timeouts).

Will check with Venkat if we share an updated stable build with you.

Well, in case, I’m really happy to be a tester!

Thank you for all these analysis

Alberto

Topic		Replies	Views
Eventing timer function not working after same doc created Eventing	23	2550	October 14, 2020
Eventing function runs quite late after the document expires Eventing	3	1613	July 22, 2020
Example 2 not create documents in target Eventing	9	2382	April 13, 2018
Event function N1QL not working Eventing query , n1ql	2	783	August 15, 2022
Does eventing happen one document at a time? Eventing eventing	3	1529	April 13, 2021

Eventing timer function takes long time to get triggered

Related topics