Concurrent database migration error

Hello,

we are using an entryfile.sh that runs our migrations before the application starts as stated here: How to run commands before the application starts | Qovery However, we are now running into ActiveRecord::ConcurrentMigrationError as the script in entrypoint is called from all running (or starting) web processes.

What is a good practice to overcome this?

Best, Florian

Hi @FlorianSuchan , it’s a good question and a common issue that we see users facing when using Qovery and Kubernetes in general.

A good practice is to use a migration lock (probably provided by Rails or via an additional lib). Then only one instance will make the migration while the second one will wait until it’s done.

By looking around I’ve found this article explaining how to implement this kind of lock with a Rails app.

Let me know if it works for you.

Hey @FlorianSuchan

When setting up Qovery at Tint, we faced the exact same problem.
We are using knex to run database migrations when starting the instance.

Our problem was that the first instance that is starting is locking the migration table, and the other instances are trying to do the same, but were just failing because of the lock.

Usually, a well-suited ORM or a web framework would handle that for us, but we have a pretty manual setup.

So we choose to wrap the migration script with a loop that just waits for the lock to be release before starting the app.

# Dockerfile
# 1. Run the migration script, which is supposed to either run the migrations or wait for the lock to be released
# 2. Run the app
CMD ./scripts/migrate.sh && node index.js
# scripts/migrate.sh
set -e

MAX_RETRY_ATTEMPTS=5
SLEEP_INTERVAL_IN_SECONDS=2

attempts=0
while [[ $attempts -le $MAX_RETRY_ATTEMPTS ]]; do
    if [[ $attempts -gt 0 ]]; then
        echo "Failed to run DB migrations. Retrying ... ($attempts)"
        sleep $SLEEP_INTERVAL_IN_SECONDS
    fi

    set +e
    yarn migrate:latest # This line is running the migrations, and fails if the migration table is locked
    exit_code=$?
    set -e

    if [[ $exit_code == 0 ]]; then
        exit 0
    fi

    let "attempts+=1"
done

# Notify engineers that something did go wrong (we're using Slack webhooks)

exit 1

We had no issue with that script for a few months now.
Let me know if I can help you further!

1 Like

Hi @Kmaschta, thanks for reaching out, will have a look at your solution and see if that solves the problem. From a conceptual point of view it totally makes sense!

Cheers, Florian