Project

General

Profile

Actions

Bug #2094

closed

User reports site outage

Added by Matt Gold over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority name:
Normal
Assignee:
-
Category name:
Server
Target version:
Start date:
2012-09-03
Due date:
% Done:

0%

Estimated time:
Deployment actions:

Description

From a user email:

At about 5.30 pm on Monday received this error message when trying to access material on the Commons (and the Commons directly):

Error establishing a database connection

I had previously been on the Commons site.

This is in line with chance outages that I've experienced occasionally, as well, but this is our first user report of the same. André, can you look at the logs and let us know whether you notice anything?

Actions #1

Updated by local admin over 11 years ago

I received an alert for this:

***** Nagios 2.12 *****

Notification Type: PROBLEM

Service: mySQL
Host: commons.gc.cuny.edu
Address: commons.gc.cuny.edu
State: CRITICAL

Date/Time: Mon Sept 3 17:33:45 EDT 2012

Additional Info:

Too many connections

I would be very surprised if this wasn't an offshoot of http://redmine.gc.cuny.edu/issues/1962. Solving this will more than likely necessitate solving that as well. Let me see what I can dig up.

Actions #2

Updated by Boone Gorges over 11 years ago

Hey André - Total side question. What do you use for service monitoring? I have a project where I'd like to receive similar emails of mysqld outages.

Actions #3

Updated by local admin over 11 years ago

Boone Gorges wrote:

Hey André - Total side question. What do you use for service monitoring? I have a project where I'd like to receive similar emails of mysqld outages.

Nagios all the way, man : )

I can share my configs and help you get setup.

Actions #4

Updated by Boone Gorges over 11 years ago

Great tip - I haven't heard of Nagios!

If you can easily send your config files, it'd give me a huge leg up. Thanks for offering :)

Actions #5

Updated by Matt Gold over 11 years ago

Let me see what I can dig up.

okay -- thanks, André

Actions #6

Updated by local admin over 11 years ago

Boone Gorges wrote:

Great tip - I haven't heard of Nagios!

Dude, seriously nagios is so awesome. Working with it is one of the more satisfying, non-sucky parts of the day.

If you can easily send your config files, it'd give me a huge leg up. Thanks for offering :)

Done. let me know if I can help you implement it and get it running.

Actions #7

Updated by local admin over 11 years ago

  • Status changed from Assigned to Resolved

I looked at the historical data for the database service and seems like while we're averaging well under the default limit of 151 connections, there are sporadic spikes that hit that ceiling and more than likely cause these brief connection failures. We should continue to try and optimized things on two fronts: optimizing database utilization by the application and continuing to tune mySQL for optimal performance. We should probably track both these efforts on on http://redmine.gc.cuny.edu/issues/1962.

Immediately I increased the limit on number of connections to 500, which should prevent the failed connections while we move forward with diving deeper into the optimization/tuning. Gonna mark this one resolved for now, but let's re-open it if 500 proves to be insufficient or if anything unexpected causes the problem to persist.

Actions #8

Updated by Matt Gold over 11 years ago

Okay -- sounds like a good plan. Many thanks, André!

Actions

Also available in: Atom PDF