Bug #2094
closed
Added by Matt Gold about 12 years ago.
Updated about 12 years ago.
Description
From a user email:
At about 5.30 pm on Monday received this error message when trying to access material on the Commons (and the Commons directly):
Error establishing a database connection
I had previously been on the Commons site.
This is in line with chance outages that I've experienced occasionally, as well, but this is our first user report of the same. André, can you look at the logs and let us know whether you notice anything?
I received an alert for this:
***** Nagios 2.12 *****
Notification Type: PROBLEM
Service: mySQL
Host: commons.gc.cuny.edu
Address: commons.gc.cuny.edu
State: CRITICAL
Date/Time: Mon Sept 3 17:33:45 EDT 2012
Additional Info:
Too many connections
I would be very surprised if this wasn't an offshoot of http://redmine.gc.cuny.edu/issues/1962. Solving this will more than likely necessitate solving that as well. Let me see what I can dig up.
Hey André - Total side question. What do you use for service monitoring? I have a project where I'd like to receive similar emails of mysqld outages.
Boone Gorges wrote:
Hey André - Total side question. What do you use for service monitoring? I have a project where I'd like to receive similar emails of mysqld outages.
Nagios all the way, man : )
I can share my configs and help you get setup.
Great tip - I haven't heard of Nagios!
If you can easily send your config files, it'd give me a huge leg up. Thanks for offering :)
Let me see what I can dig up.
okay -- thanks, André
Boone Gorges wrote:
Great tip - I haven't heard of Nagios!
Dude, seriously nagios is so awesome. Working with it is one of the more satisfying, non-sucky parts of the day.
If you can easily send your config files, it'd give me a huge leg up. Thanks for offering :)
Done. let me know if I can help you implement it and get it running.
- Status changed from Assigned to Resolved
I looked at the historical data for the database service and seems like while we're averaging well under the default limit of 151 connections, there are sporadic spikes that hit that ceiling and more than likely cause these brief connection failures. We should continue to try and optimized things on two fronts: optimizing database utilization by the application and continuing to tune mySQL for optimal performance. We should probably track both these efforts on on http://redmine.gc.cuny.edu/issues/1962.
Immediately I increased the limit on number of connections to 500, which should prevent the failed connections while we move forward with diving deeper into the optimization/tuning. Gonna mark this one resolved for now, but let's re-open it if 500 proves to be insufficient or if anything unexpected causes the problem to persist.
Okay -- sounds like a good plan. Many thanks, André!
Also available in: Atom
PDF