We recently installed Grafana on our Qovery cluster so we can access our logs from Loki. However, when we query larger amounts of logs we occasionally get “too many outstanding requests” errors.
Quick Google searched suggested this GitHub issue which points to a solution with a custom configuration. The only configuration value we found in Qovery was “loki.log_retention_in_week”. Is it possible to customize other Loki settings?
Also, most of our applications log data in JSON format so it would be great to configure JSON parsing and label extraction for each application. It would be nice to specify certain static labels for each application to improve on Qovery labels which have unique IDs in them and don’t allow us to aggregate logs or reuse dashboards for the same application in different environments.
However, we would also like to customize Loki pipeline to enable JSON parsing and extracting more labels for faster queries. Ideally, this could be configured per Qovery application but more raw access to Loki configuration would be enough for us to find less elegant solutions.
Unfortunately, for the time being, this Loki instance is for our internal usage (apps logs, etc) and is not really meant to be used otherwise. We have to keep to this service up and running.
Also, even if we add more and more params, it will eventually not be enough for everyone’s use case. So the best option for you would be to deploy your own Loki instance using Helm deployment, this way you will be able to pimp your Loki instance (and a dedicated S3 bucket) as you please and even if this one dies eventually, it won’t impact your Qovery integration.
I avoid deploying another Loki instance to not waste additional resources (we already have one running and paying for it). I also don’t want us to start maintaining a bunch of DevOps tools, we chose Qovery for exactly this reason. Ideally, I would like Qovery to provide a good out-of-the-box configuration, make it accessible to us, and still rely on Qovery to keep it running.
Loki is used for a single reason at Qovery: providing log history as Kubernetes is not doing it.
We do not allow by default to use Loki to make search, because it can easily be unresponsive due to the setup we’ve made. It has been customized for this purpose only to limit the resources consumption and reduce the cloud provider bill.
I got your point regarding the wasting resources, but the current Loki setup is not ready to handle all those workload at the same time. And if Loki becomes unavailable, then logs will be missing as well, which is not expected.
If you want to make history log search, you can setup your own Loki. Or you can use ElasticSearch, Datadog or AWS Cloudwatch.
Thanks for your response, I get your point but I think you are missing out on providing a lot of value to your customers.
Logging and monitoring are crucial parts every single Kubernetes cluster needs just like ingress and cert-manager utilities that you manage for us. Technically you already provide access to logs in Qovery because it’s such an important part of application deployment but your logs implementation is very basic and rather than building on top of it, I think you should strengthen support for people leveraging underlying Loki instance directly.
This topic is discussed internally for a long time now. We totally agree with you regarding the value proposed to our customer.
And this is why we do not restrict anything. We give to our customers the possibility to install other solution (like the ones described above), and encourage them to do so with documentations.
Qovery is not a monitoring platform and will never try to compete with company doing it. Instead, we make partnership with them like Datadog, because we know it will answer 100% of customer’s need.
It looks like we will want to move toward the BYOK solution in the near future. However, at that point, the value Qovery brings goes down significantly.