Sync gateway not syncing documents with cb lite, replicator stuck on busy

Thank you for you reply - is there anything we can help here?

When you say “deleteting and re-creating affected users in sync gateway”, are you talking about users of your application, or sync gateway database user (like sync_g_user in the doc examples) ?

I’m deleting the sync gateway user. We use oidc for user management, if a user sends request to the sync gateway, sgw will create a user in the sgw database (with name <url_of_identity_provider>_<user_id>).
I use the sgw admin api to delete this user (http del request to /db/_user/<name>) and re-create it via the api (adding all admin channels and roles to the user again).

@fifteen_renditions can you reproduce this issue with trace logging enabled and post the logs here?

You can do that by setting logging.trace.enabled Bootstrap Configuration | Couchbase Docs to true in your sgw config or a PUT /_config api request Admin REST API | Couchbase Docs to change your log levels.

I captured trace logs for the logs above, but unfortunately the capture failed before the sgw restart (the log files written by sgw don’t exist anymore sadly).

These are the logs, if important information is missing then let me know, then I’ll try to recreate everything.

debug_hung_working_after_restart_trace.log.zip (1.1 MB)

@fifteen_renditions I have identified the code path causing the infinitely busy revocation feed but I am struggling to force it with a unit test. Are you using scopes and collections on sync gateway? and if you are, can you tell me a bit about how you were assigning and removing channels from the user that was running the replication around the time you saw this issue?

Hi @mohammedcb,

nice to hear that!

We currently have no additional scopes and collections on sync gateway. Since we’re using v 3.1, the requests are scopes by default I think.

We set both admin roles and admin channels for the user. Each role consists of one channel.

We did change the role assignment of one user in very short succession before we noticed the sync errors with that user.
One example looks like this.
Immediately after every log is printed, we fired a http PUT request async to sgw admin url to update the user (http://sgw_admin_url/db/_user/), so we fired 5 PUT requests here.

2023-08-07 06:40:29,622 - app.sgw - DEBUG - sgw.update_user_roles computed the following roles/channels for user xy: ['dn_cc_f0b7f7af-deaf-4e3f-836c-cfa902e26dc2', 'dn_pr_35498408-f4f5-41f8-9c28-cda0ee49229c', 'dn_pr_f79ea2e2-3738-498a-b123-f0f3b7a51568']
2023-08-07 06:40:29,629 - app.sgw - DEBUG - sgw.update_user_roles computed the following roles/channels for user xy: ['dn_cc_f0b7f7af-deaf-4e3f-836c-cfa902e26dc2', 'dn_pr_f79ea2e2-3738-498a-b123-f0f3b7a51568']
2023-08-07 06:40:29,726 - app.sgw - DEBUG - sgw.update_user_roles computed the following roles/channels for user xy: []
2023-08-07 06:40:29,730 - app.sgw - DEBUG - sgw.update_user_roles computed the following roles/channels for user xy: []
2023-08-07 06:43:56,329 - app.sgw - DEBUG - sgw.update_user_roles computed the following roles/channels for user xy: ['dn_pr_35498408-f4f5-41f8-9c28-cda0ee49229c']

Let me know if I can provide more info here!

Thanks.

@fifteen_renditions can you please provide me with your bootstrap config Bootstrap Configuration | Couchbase Docs and sgw database config Database Configuration | Couchbase Docs?
You can do a GET db/_config rest api call to get the db config and a GET /_config rest api call to get the bootstrap config

db config:

{
    "bucket": "vgs-main",
    "name": "db",
    "sync": "function (doc, oldDoc) {\n     prefix_document_channel = \"C_DOCUMENT_\";\n    prefix_company_channel = \"C_COMPANY_\";\n\n    prefix_asset_create = \"P_ASSET_CREATE_\";\n\n    asset_verifier_channel = \"asset_verifier\";\n\n    // webSocket = new WebSocket(\"ws://echo.websocket.org\");\n    // console.log(webSocket)\n    // var xmlHttp = new XMLHttpRequest();\n    // TODO: check for doc.type (e.g. \"lieferschein\" => do this, \"rechnung\" => do this)\n    // TODO check if document was deleted and skip validation checks\n    // doc._deleted === true\n    // document_channel_id is always based on asset_id, i.e. the id of the main document\n    if (doc.document_type === \"asset_chain\") {\n        document_channel_id = prefix_document_channel + doc.asset_id\n    } else if (doc.document_type === \"asset_main\") {\n        document_channel_id = prefix_document_channel + doc._id\n    }\n    company_create_roles = []\n    company_channels = []\n    // TODO: prevent asset_type and document_type changes\n    if (doc.document_type === \"asset_sharing\" || doc.document_type === \"asset_sharing_push\") {\n        // legacy method using target_documents\n        target_docs = []\n        for (d in doc.target_documents) {\n            // TODO: check if user is allowed to share\n            target_docs.push(prefix_document_channel + doc.target_documents[d\n            ])\n        }\n        access(doc.target_users, target_docs)\n        // new method using target_doc\n        access(doc.target_users, prefix_document_channel + doc.target_doc)\n        channel(asset_verifier_channel)\n    }\n    else if (doc.asset_type === \"invoice\") {\n        try {\n            channel(\"invoice_out_\" + doc.body.header.issuer.id)\n            channel(\"invoice_in_\" + doc.body.invoice.buyer.id)\n        } catch (error) {\n            channel(\"invoice_out_\" + doc.issuer.id)\n            channel(\"invoice_in_\" + doc.buyer.id)\n        }\n    }\n    else if (doc.document_type === \"asset_main\" || doc.document_type === \"asset_chain\") {\n        // give channels based on sgwc field\n        handleSgwcField(doc)\n        for (i in doc.sgw) {\n            meta = doc.sgw[i]\n            if (meta.company_id) {\n                // add to company channel (of supplier through plant and buyer through cost_center)\n                channel(\"dn_\" + meta.company_id)\n\n                if (meta.type && meta.type_id) {\n                    // add to cost center or plant channel, dn_cc_ccID\n                    channel(\"dn_\" + meta.type + \"_\" + meta.type_id)\n                }\n            }\n        }\n        if (doc.document_type === \"asset_chain\") {\n            channel(asset_verifier_channel)\n            // add asset_chain doc to asset_main channel\n            channel(prefix_document_channel + doc.asset_id)\n        }\n        if (oldDoc === null) {\n            // companies = doc.companies\n            // for (c in companies){\n            //     company_create_roles.push(prefix_asset_create + companies[c\n            //     ])\n            //     // make sure user is allowed to create assets for this company\n            //     company_channels.push(prefix_company_channel + companies[c\n            //     ])\n            // }\n            // user has to have access to at least one CREATE role for a company\n            // disable company field and prob move permissions outside of couchbase?\n            // requireRole(company_create_roles)\n            // add to channel for only the document\n            channel(document_channel_id)\n            // add to companies channel\n            // channel(company_channels)\n            // workaround to get user\n            created_by = doc.created_by\n            requireUser(created_by)\n            // grant user access to document channel\n            // TODO: this might not be necesarry because the creation is normally done by the ERP connector, which is a\n            // service user\n            access(created_by, document_channel_id)\n            // new creation:\n            // new channel for doc\n            // add document to company channel\n            // add creator (and maybe others) to doc channel\n            // other (Polier) gets access if he uploads a document with valid signature of smbdy with access\n            // out-of-bound: role per company?\n            // one for all members who can create new assets, e.g. company-admin-create\n            // one for admins who can read all e.g. company-admin-read to read all documents\n            // one for admins who can edit all? e.g. company-admin-write\n            // TODO: sanity checks: id is same as _id\n            grantPermission(doc, document_channel_id, prefix_document_channel)\n        }\n        else {\n            // Handle company change. Right now, its forbidden\n            // if (!areArraysEqual(doc.companies, oldDoc.companies)){\n            //     throw({forbidden: \"Cannot change companies\"\n            //     })\n            // }\n            // companies = oldDoc.companies\n            // for (c in companies){\n            //     company_create_roles.push(prefix_asset_create + companies[c\n            //     ])\n            //     // make sure user is allowed to create assets for this company\n            //     company_channels.push(prefix_company_channel + companies[c\n            //     ])\n            // }\n            // maybe distinguish between add and delete company\n            channel(document_channel_id)\n            // channel(company_channels)\n            // (re)give user access to read document\n            // this also gives access do the document_channel_id even though the user might not have had it previously\n            // i.e. he could have had access via another channel\n            // this might not be necessary anymore since adding oldDoc.creator to document_channel_id\n            modified_by = doc.modified_by\n            requireUser(modified_by)\n            access(modified_by, document_channel_id)\n\n            // creator was added to channel when creating, readd him\n            access(oldDoc.created_by, document_channel_id)\n\n            // prevent read-only fields from changing\n            read_only_fields = [\n                \"created_by\"\n            ]\n            preventReadOnlyFieldsFirstLevel(doc, oldDoc, read_only_fields)\n            grantPermission(doc, document_channel_id, prefix_document_channel)\n        }\n    }\n    function grantPermission(doc, document_channel_id, prefix_document_channel) {\n        missing_sigs = doc.missing_signatures\n        involved_users = doc.involved_users\n        access(missing_sigs, document_channel_id)\n        access(involved_users, document_channel_id)\n    }\n    function areArraysEqual(arr1, arr2) {\n        if (arr1 === undefined || arr2 === undefined) {\n            return false\n        } else {\n            return arr1.length === arr2.length && arr1.every(function (value, index) {\n                return value === arr2[index\n                ]\n            })\n        }\n    }\n    function handleSgwcField(doc) {\n        // gives channel access based on sgwc field\n\n        // sgwc field has lists as value here\n        list_keys = [\n            \"ar_inc\",\n            \"c_ar_inc\",\n            \"ar_out\",\n            \"c_ar_out\",\n            \"si_inc\",\n            \"c_si_inc\",\n            \"si_out\",\n            \"c_si_out\",\n            \"si_out\",\n            \"c_si_out\",\n        ];\n        // here the value is only one string (uuid)\n        single_item_keys = [\n            \"c_iss\",\n            \"c_se\",\n            \"c_b\",\n            \"c_c\",\n            \"c_f\",\n            \"c_sf\",\n            \"c_st\"\n        ]\n        sgwc = get(doc, \"sgwc\", {})\n        for (i in list_keys) {\n            key = list_keys[i]\n            val = get(sgwc, key, [])\n            for (j in val) {\n                channel(key + \"_\" + val[j]);\n            }\n        }\n        // add single item keys\n        for (i in single_item_keys) {\n            key = single_item_keys[i]\n            val = get(sgwc, key, null)\n            if (val != null && val != \"\") {\n                channel(key + \"_\" + val)\n            }\n        }\n    }\n    function get(object, key, default_value) {\n        // helper function for safe dict access with default value\n        var result = object[key];\n        return (typeof result !== \"undefined\") ? result : default_value;\n    }\n    function preventReadOnlyFieldsFirstLevel(doc, oldDoc, read_only_fields) {\n        for (i in read_only_fields) {\n            field = read_only_fields[i\n            ]\n            if (doc[field\n            ] != oldDoc[field\n                ]) {\n                throw ({\n                    forbidden: \"Cannot change read only field: \" + field\n                });\n            }\n        }\n    }\n}\n",
    "import_docs": true,
    "oidc": {
        "providers": {
            "keycloakatu": {
                "issuer": "https://idp_url",
                "register": true,
                "client_id": "automated-testing",
                "username_claim": "",
                "roles_claim": "",
                "channels_claim": "",
                "allow_unsigned_provider_tokens": false,
                "IsDefault": false,
                "Name": "",
                "InsecureSkipVerify": false
            },
            "keycloakimplicit": {
                "issuer": "https://idp_url",
                "register": true,
                "client_id": "mobile_apps",
                "username_claim": "",
                "roles_claim": "",
                "channels_claim": "",
                "allow_unsigned_provider_tokens": false,
                "IsDefault": false,
                "Name": "",
                "InsecureSkipVerify": false
            }
        }
    },
    "enable_shared_bucket_access": true,
    "num_index_replicas": 0
}

bootstrap config:

{
    "bootstrap": {
        "server": "couchbases://localhost",
        "username": "sync-gateway-admin",
        "password": "xxxxx",
        "server_tls_skip_verify": true
    },
    "api": {
        "admin_interface": "10.37.4.3:4985",
        "admin_interface_authentication": false,
        "https": {},
        "cors": {
            "origin": [
                "http://localhost:8080",
                "http://localhost:4984"
            ],
            "login_origin": [
                "http://localhost:8080",
                "http://localhost:4984"
            ],
            "headers": [
                "Content-Type",
                "Authorization",
                "Set-Cookie",
                "sentry-trace"
            ]
        }
    },
    "logging": {
        "console": {
            "enabled": true,
            "rotation": {},
            "log_level": "debug",
            "log_keys": [
                "*"
            ]
        },
        "error": {
            "rotation": {}
        },
        "warn": {
            "rotation": {}
        },
        "info": {
            "rotation": {}
        },
        "debug": {
            "enabled": true,
            "rotation": {}
        },
        "trace": {
            "enabled": false,
            "rotation": {}
        },
        "stats": {
            "rotation": {}
        }
    },
    "auth": {},
    "replicator": {},
    "unsupported": {
        "serverless": {},
        "http2": {}
    }
}

Additional server info can also be found here: Http: panic serving <IP>: runtime error: invalid memory address or nil pointer dereference after sync gateway upgrade to 3.1 - #5 by fifteen_renditions

Are there any news regarding this issue?

@Nic the issue will be fixed in an upcoming release. The fix is identified but you will have to wait for a release with it fixed. I’m afraid I can’t share any dates or version numbers but feel free to contact your couchbase accounts manager/sales contact.

I’m glad to hear that and understand entirely the uncertainty regarding the release. There is only one remaining question: Can we do any workaround on our side until then? The probability of occurrence of this issue has significantly increased in the last weeks.

i dont think there is unfortunately, are you using this in a production environment?

Yes, unfortunately, the issue is occurring on Prod. We thought about an automatic re-create of users overnight. But this seems like a very hacky temporary work around.

Sync Gateway 3.1.2 has been released with a fix for this issue.

1 Like