Job execution history is incorrect and history is not cleaned after configured TTL

Matthieu_Delanoe · April 25, 2024, 9:42am

INFORMATION

Relevant information to this issue:

ISSUE

I have some cron jobs configured on my cluster. I have configured them in the advanced settings so that:

cronjob.failed_jobs_history_limit = 3
cronjob.success_jobs_history_limit = 3
job.delete_ttl_seconds_after_finished = 259200 (3 days)

However on my cron job page on Qovery, I only see some of the kept executions, not all of them. Usually only the last 3 that failed; almost never the successfull ones. When I look in my cluster directly, those execution are correctly present.
Screenshot on Qovery

Screenshot of the cluster history for that same cron:

I would expect to see the 6 executions on the Qovery UI.
Here I always have the impression that my crons are in error, even though the last executions were a success.

Moreover my setting to remove an execution from history after 3 days doesn’t seem to be enforced. As you can see there is a job that was executed 15d ago that is still there.

Thanks!

bchastanier · April 25, 2024, 10:11am

Hello @Matthieu_Delanoe !

We had this bug reported couple weeks back, we fixed it last week.
It requires a cluster redeployment (your cluster doesn’t have the fix). I just triggered an update on your cluster. Should be better now.

Regarding the TTL on jobs, there might have a clashing behavior with last N executions. I didn’t found any documentation on this front for now, I will update this post if I find anything.

Cheers,

Matthieu_Delanoe · April 25, 2024, 1:26pm

Alright thanks for the cluster update, it works well now for the listing of all kept executions.

Before posting the issue here, I first looked at the cluster page and settings, but I don’t remember seeing a message or notice saying that an update was available. Was there any way for me to know that I should have updated my cluster?
Curiosity follow up questions:

When updating a cluster, is there a way to know in advance what new settings/fixes will be applied?
Is it safe to update a cluster in terms of interruption of hosted services?

I’ll wait for your research about the TTL issue.
Thx

bchastanier · April 25, 2024, 2:19pm

Having a way for you to know a cluster update is available is something we should work on, (cc @Alessandro_carrano), there is no way for you for the time being to know if a new version is available.
There is no diff / changelog on what will be applied on your cluster for the time being, it should be considered as daily routine.
But out of curiosity, what would you like to know on that front? Given the fact your cluster is fully managed, do you want to know what changes?

You will have a badge saying you should update your cluster if you change any settings on it (advanced settings, node type, etc), but that’s it for now.

Our cluster updates are meant not to be breaking nor cause any downtimes and can be run anytime without causing services interruptions.

If we need to proceed with updates leading to downtime eventually, we are extra cautious and communicate ahead (for example for kubernetes updates).

Cheers

Matthieu_Delanoe · May 14, 2024, 1:26pm

Hi @bchastanier ,

Have you been able to figure out why the TTL for older jobs is not enforced, we still have the issue where old jobs that failed are polluting our view, making us think something is wrong, when it’s all ok.

bchastanier · May 14, 2024, 2:02pm

Hey @Matthieu_Delanoe,

Not yet indeed, sorry about that
I will have a look as soon as I have a bit of time, in the meantime, I know the team is working on improving the UI view around jobs, it should greatly help you to have a better sense of what’s going on and if there were recent failures.

Keep posted

ce_gagnaire · July 23, 2024, 2:36pm

Hello @Matthieu_Delanoe,

Thank you for your feedback, to keep you updated, we created a new ticket in our tracker.

I close this topic, but we will get back to you when it’s done.

Regards,
Charles-Edouard

system · July 30, 2024, 2:36pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

ce_gagnaire · August 26, 2024, 11:31am

Hello @Matthieu_Delanoe,

we delivered a fix for a bug with cronjobs where job.delete_ttl_seconds_after_finished was not applied to spec.ttlSecondsAfterFinished. This applies to your problem.

You have to redeploy your cronjob and it will apply the patch automatically.

I hope this is helpful.
Charles-Edouard

Topic		Replies	Views
'Cronjob has never been executed' message Web UI	4	139	April 26, 2024
BackoffLimitExceeded:Job has reached the specified backoff limit Questions and Answers	7	5834	February 13, 2024
Scaleway - Deployment of lifecycle jobs and cron jobs doesn't work anymore Questions and Answers	7	36	August 19, 2024
Force cron job halts other deployments? Deployment	9	454	March 25, 2024
CLI Error with Qovery jobs Questions and Answers	3	97	May 7, 2024

Job execution history is incorrect and history is not cleaned after configured TTL

Related topics