Sudden connection timeout on our managed MongoDB instance

KevinFinvault · May 11, 2022, 5:40pm

Issues information

OS: MacOS
databases: AWS Managed MongoDB
Programming language and version: Python 3.10.2
Link to your project on Github/Gitlab: N/A
Link to your application: https://console.qovery.com/platform/organization/b75561af-ab0b-4e44-ad7f-df8cf973fa39/projects/5c6545b0-6914-44ea-add6-e043646373e6/environments/7c54ff6e-00f0-4c03-a386-0278bba3dec6/applications/f2f130e2-d50f-4cd4-9f53-def9a7883de1/summary

Your issue

Although we have not gone live yet, our prod env has started giving the following timeout error when trying to connect to the managed MongoDB instance. Any ideas why this could be? It worked previously, and I don’t think any changes to prod have been made. This issue would bring our entire application down if it was live in production - I am not sure if it is a Qovery issue, or we accidentally changed something without realising.

|11 May, 18:39:55.639|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|└ MongoClient(host=['z9f5978fc-mongodb.zab943a2a.rustrocks.me:27017'], document_class=dict, tz_aware=False, connect=False, ssl=Tru...|
| --- | --- | --- | --- |
|11 May, 18:39:55.639|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|File "/usr/local/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1921, in _get_server_session|
|11 May, 18:39:55.639|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|return self._topology.get_server_session()|
|11 May, 18:39:55.639|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|File "/usr/local/lib/python3.10/site-packages/pymongo/topology.py", line 520, in get_server_session|
|11 May, 18:39:55.639|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|session_timeout = self._check_session_support()|
|11 May, 18:39:55.639|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|└ <Topology <TopologyDescription id: 626037ada9b9df407795933a, topology_type: Single, servers: [<ServerDescription ('z9f5978fc-mon...|
|11 May, 18:39:55.640|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|File "/usr/local/lib/python3.10/site-packages/pymongo/topology.py", line 499, in _check_session_support|
|11 May, 18:39:55.640|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|self._select_servers_loop(|
|11 May, 18:39:55.640|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|File "/usr/local/lib/python3.10/site-packages/pymongo/topology.py", line 218, in _select_servers_loop|
|11 May, 18:39:55.640|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|raise ServerSelectionTimeoutError(|
|11 May, 18:39:55.640|app-zf2f130e2-5b88677d96-fcc4h|7b01c8|pymongo.errors.ServerSelectionTimeoutError: z9f5978fc-mongodb.zab943a2a.rustrocks.me:27017: timed out, Timeout: 30s, Topology Description: <TopologyDescription id: 626037ada9b9df407795933a, topology_type: Single, servers: [<ServerDescription ('z9f5978fc-mongodb.zab943a2a.rustrocks.me', 27017) server_type: Unknown, rtt: None, error=NetworkTimeout('z9f5978fc-mongodb.zab943a2a.rustrocks.me:27017: timed out')>]>|

Dockerfile content (if any)

FROM python:3.10.2-buster

COPY requirements.txt requirements.txt

RUN python -m pip install --upgrade pip

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD python main.py

Thanks,
Kevin

KevinFinvault · May 12, 2022, 7:23am

Even stranger, on some rare occasions it works fine and the database connection is successful, however, most times we make requests that involve a database read/write we get this timeout, so this leads me to believe there is no issue from our end.

KevinFinvault · May 12, 2022, 7:53am

I have also tried redeploying all apps and the database in the environment and still face the same issue.

bilel · May 12, 2022, 10:35am

Hello @KevinFinvault,
I’m taking a look at this and will come back to you as soon as I have more informations on it.

bilel · May 12, 2022, 3:35pm

The database setup on Qovery’s side looks good, I am wondering if it’s not a performance issue.

Could you take a look at the monitoring tools available in AWS, maybe this would help: Monitoring with Performance Insights - Amazon DocumentDB

Also I found out that pymongo might not be fork safe and might create too many connection causing the database to timeout: The Art of Graceful Reloading — uWSGI 2.0 documentation

KevinFinvault · May 13, 2022, 6:57am

Hi @bilel,

Thanks for taking the time to look into this and to respond.

We are actually using Motor which uses PyMongo underneath, with some differences to how PyMongo itself works - notably in how connections are created, and that multithreading and forking are not supported.

Secondly, we have been using this setup for ~5 months now and have never seen this issue arise once. Our dev env is still running without issue. I think if this was an issue with Motor/PyMongo, then we would have seen this issue arising intermittently from the start months ago, rather than suddenly happening consistently after several months.

I took a look at the monitoring dashboard on AWS the other day, but didn’t notice anything strange. Let me take a deeper look now.

KevinFinvault · May 13, 2022, 7:18am

@bilel I can’t see anything strange in the monitoring or Performance Insights pages. Anything in particular I should look for?

KevinFinvault · May 13, 2022, 7:23am

@bilel Hmm, interestingly, I noticed that each time I make a query to the database, this CPU Utilization Wait metric either drops or spikes, not sure which yet. Don’t know what this means exactly?

rophilogene · May 14, 2022, 8:03am

Your dev environment works with a MongoDB container instance where your prod env runs a MongoDB managed instance.

@bilel it’s possible that the MongoDB managed instance is too small and can’t handle the same amount of connections to the MongoDB instance compared to the dev one. AWS limits the number of connections depending on the instance type of the managed database.

@KevinFinvault can you show us your database network connections graph for your managed MongoDB instance (Service: AWS DocumentDB). Thank you

KevinFinvault · May 16, 2022, 8:54am

Hi @rophilogene here is the connections graph for the last 4 weeks. Seems to have been mostly at 0 for the last week or so, even when I attempt to perform a database operation, but the spike to 100 a while back seems particularly weird given that we aren’t really using our production env actively yet.

What you mentioned about the managed instance not being able to handle as many connections may be an issue, since, I believe, Motor will create a new connection for each database operation.

rophilogene · May 16, 2022, 8:59am

@benjaminch is going to take a look - but a number of connections with a PostgreSQL DB are a common issue. That’s why PgBouncer is so successful.

Note: To make PostgreSQL / RDS able to handle more connections, you just need to use a bigger instance than the one currently used (if it is the problem of course).

bchastanier · May 16, 2022, 9:46am

Hey @KevinFinvault,

Do you have any clue about number of connections to expect once production will be live?
As @rophilogene mentioned, increasing managed Mongo size might be an option.

Found this doc on Mongo’s side, maybe we can find some clues: Maximizing MongoDB Performance on AWS | MongoDB Blog

Let me know how I can help.

Cheers

Topic		Replies	Views
App unable to connect to qovery managed mongo db Questions and Answers	8	624	March 25, 2024
ServerSelectionTimeoutError, managed Mongo DB instance Questions and Answers aws	5	516	March 25, 2024
Unable to connect to Mongo Atlas Deployment	2	679	March 25, 2024
Private Managed Redis v7 connection timeout error Questions and Answers	2	318	March 25, 2024
Qovery managed Mongo DB instance does not have a V4.4 Questions and Answers qovery , aws	5	851	March 25, 2024

Sudden connection timeout on our managed MongoDB instance

Related topics