Yesterday we noticed app performance was pretty degraded for no clear reason. Requests were timing out, taking a long time, etc. even though health checks on our EC2 instances and databases weren’t showing any clear issues and everything was running as it had been.
Out of curiousity, as a test, I deleted one of our two Qovery EKS clusters and recreated the deleted cluster by adding a new cluster to test rebooting our EC2 instances. As soon as I did this, performance improved and the issue went away.
We’re running clusters:
A (did not delete or reboot): Qovery
B (new cluster I created after delete as a test “reboot”): Qovery
App: Qovery
Based on AWS’s docs it seems like they may conduct period maintenance which might impact instance performance: Reboot your instance - Amazon Elastic Compute Cloud >> but no clear indication what the cause was here
Has this been observed before? Are there ways we can monitor for this type of issue? The “fix” was straightforward but would love to understand more about the underlying cause(s) and prevention. thanks