Job execution history is incorrect and history is not cleaned after configured TTL

INFORMATION

Relevant information to this issue:

ISSUE

I have some cron jobs configured on my cluster. I have configured them in the advanced settings so that:

  • cronjob.failed_jobs_history_limit = 3
  • cronjob.success_jobs_history_limit = 3
  • job.delete_ttl_seconds_after_finished = 259200 (3 days)

However on my cron job page on Qovery, I only see some of the kept executions, not all of them. Usually only the last 3 that failed; almost never the successfull ones. When I look in my cluster directly, those execution are correctly present.
Screenshot on Qovery


Screenshot of the cluster history for that same cron:

I would expect to see the 6 executions on the Qovery UI.
Here I always have the impression that my crons are in error, even though the last executions were a success.

Moreover my setting to remove an execution from history after 3 days doesn’t seem to be enforced. As you can see there is a job that was executed 15d ago that is still there.

Thanks!

Hello @Matthieu_Delanoe !

We had this bug reported couple weeks back, we fixed it last week.
It requires a cluster redeployment (your cluster doesn’t have the fix). I just triggered an update on your cluster. Should be better now.

Regarding the TTL on jobs, there might have a clashing behavior with last N executions. I didn’t found any documentation on this front for now, I will update this post if I find anything.

Cheers,

Alright thanks for the cluster update, it works well now for the listing of all kept executions.

Before posting the issue here, I first looked at the cluster page and settings, but I don’t remember seeing a message or notice saying that an update was available. Was there any way for me to know that I should have updated my cluster?
Curiosity follow up questions:

  • When updating a cluster, is there a way to know in advance what new settings/fixes will be applied?
  • Is it safe to update a cluster in terms of interruption of hosted services?

I’ll wait for your research about the TTL issue.
Thx

Having a way for you to know a cluster update is available is something we should work on, (cc @Alessandro_carrano), there is no way for you for the time being to know if a new version is available.
There is no diff / changelog on what will be applied on your cluster for the time being, it should be considered as daily routine.
But out of curiosity, what would you like to know on that front? Given the fact your cluster is fully managed, do you want to know what changes?

You will have a badge saying you should update your cluster if you change any settings on it (advanced settings, node type, etc), but that’s it for now.

Our cluster updates are meant not to be breaking nor cause any downtimes and can be run anytime without causing services interruptions.

If we need to proceed with updates leading to downtime eventually, we are extra cautious and communicate ahead (for example for kubernetes updates).

Cheers

Hi @bchastanier ,

Have you been able to figure out why the TTL for older jobs is not enforced, we still have the issue where old jobs that failed are polluting our view, making us think something is wrong, when it’s all ok.

Hey @Matthieu_Delanoe,

Not yet indeed, sorry about that :frowning:
I will have a look as soon as I have a bit of time, in the meantime, I know the team is working on improving the UI view around jobs, it should greatly help you to have a better sense of what’s going on and if there were recent failures.

Keep posted