Relevant information to this issue:
- databases: Postgresql and Redis
- Programming language and version: Node v18
- Link to your project on Github/Gitlab: not public
- Link to your application - Qovery
I’ve migrated our Staging environment from another PaaS to Qovery, I’ve provisioned a redis instance and a database instance and 5 app services.
Every service uses the same image, and they only differ in configuration and resources.
I assigned the same resources to all of these services compared to the resources they had with our other service provider. However, the response times of my
web service are rather unbearably slow. Depending on the complexity of the request/response, the response time ranges between 15 and 35 seconds.
I certainly hope that this is just a misconfiguration of myself, but I’ve looked everywhere and can’t seem to find the issue.
Do you have any recommendations for NestJS GraphQL app (or any backend [node] application), the cluster, database and service resources?
Hi @pantajoe , thanks for your question. Happy to help you to improve the performance of your app.
Out of curiosity - do you compare your app’s performance with the same CPU and RAM allocated? Can you please share some details here?
Yes, I’m using the same allocated resources per task/pod.
I assigned .128 vCPU and 512 MB in the beginning (resources assigned with our old PaaS) and as I noticed that the app was very slow, I increased them to .256 vCPU and 1024 MB, but still no change.
Do you use a monitoring system? How do you measure performances? Can you see what happen in your application (APM?) to see where those degraded performances are coming from?
Yes, we use Appsignal as an APM. Here’s an example:
Could the database be the bottleneck here as it indicates that only resolving one particular fields takes 2 seconds (which should not be the case).
I’m currently using a Postres DB (instance type db.t3.micro) with 20 GiB of storage.
I’ll upgrade it to db.t3.medium just to see if there’s a difference.
Okay, the upgrade to db.t3.medium did not do anything, and it seems to be the database:
It’s a very simple
UPDATE query that takes 10 seconds
Have you ever encountered anything like this?
Hi @pantajoe, I have a couple of questions to understand better where your performance issue could come from:
- Does your database and app run on the same AWS region? (It’s yes if you used the Qovery interface to deploy your RDS database)
- Can you confirm that you connect to your database from your app using the
*_INTERNAL environment variable? (It’s using the private network instead of the public network)
- Can you show your
knex configuration and verify that nothing could potentially lead to those performance penalties? Can you show me your complete
- Can you confirm that the
id field is indexed? What’s the size of your table
- Yes, it does. We used the Qovery interface to deploy the db.
- Yes, I aliased the
*_INTERNAL env var as
DATABASE_URL that we use.
- We use mikro-orm as our ORM that uses
knex under the hood. But as you can see from our APM, the delay is not caused by
knex itself, but
- We only have 68 users on our staging app and the
id columns is indexed with the
I’m going to destroy the database and re-create it just to verify that it’s not just a problem with this RDS instance.
Can you check the connection pool from your app to your DB instance? To see if your app re-uses the connections the right way?
Yes, there are only 6 active connections (4 of 5 apps have 1 pod and the other app has 2 pods).
Thanks again @rophilogene for your time!
Here’s a quick update:
I reduced the vCPU units and RAM to 0.512 vCPU and 768 MB RAM and the app is still reasonably fast. I configured the app to have at least 4 connections in the connection pool to the DB open.
So I would deduce that establishing a connection from a service to the Postgres DB seems somewhat expensive, i.e., it takes longer than expected. This is in line with what I discovered in the other services as well as the lifecycle job I configured. In our previous setup the database migration command (without any migrations to be run) took about 3-5 seconds. It now takes around 10-15 seconds.
Hello again! Unfortunately, this issue still persists? Did you have some time to check this out by any chance? Or do you have any idea as to what the issue might be?