Bug #13429: Multiple forum notifications - CUNY Academic Commons - CUNY Graduate Center - Project Tracking System

Actions

Copy link

Bug #13429

closed

Multiple forum notifications

Added by Matt Gold about 5 years ago. Updated about 5 years ago.

Status:

Resolved

Priority name:

Normal

Assignee:

Raymond Hoh

Category name:

Email Notifications

Target version:

Not tracked

Start date:

2020-10-05

Due date:

% Done:

Estimated time:

Deployment actions:

Description

Hi Ray -- I received four separate email notifications for this post - https://commons.gc.cuny.edu/groups/ma-in-digital-humanities-cuny-graduate-center/forum/topic/2020-2021-ma-ms-student-survey/ . The poster, Micki Kaufman, says that she received three. can you please see? If I can provide other info, please let me know. I do know that this was posted to one group and cc'ed to another via the web interface

Related issues

Actions

Copy link

Updated by Matt Gold about 5 years ago

I just ran into this issue myself with this post -- https://commons.gc.cuny.edu/groups/ma-in-digital-humanities-cuny-graduate-center/forum/topic/cfp-ach-digital-humanities-mentorship-working-group/

I cross-posted to four groups and received four notifications. So I think that whatever bit of code was preventing people from receiving only a max of one notification for a cross-posted topic is not working.

Actions

Copy link

Updated by Raymond Hoh about 5 years ago

Hi Matt,

For the latest instance, I can see that four emails were sent to your address between 9:58:21am ET and 9:58:24am ET:

Oct  6 09:58:21 lw2a postfix/smtp[72886]: 541AB204CFC6: to=<MATTs_EMAIL_ADDRESS>, relay=relay.gc.cuny.edu[172.29.28.156]:25, delay=0.42, delays=0.01/0/0.01/0.4, dsn=2.6.0, status=sent (250 2.6.0 <bTlIkI4VBGodQ4QD9VYusBcEZXOQkt8RDPnxEO2kle0@commons.gc.cuny.edu> [InternalId=148743307397682, Hostname=wd5b.gc.cuny.edu] Queued mail for delivery)
Oct  6 09:58:21 lw2a postfix/smtp[72891]: 6B77A204CFC9: to=<MATTs_EMAIL_ADDRESS>, relay=relay.gc.cuny.edu[172.29.28.156]:25, delay=0.44, delays=0.01/0.01/0/0.42, dsn=2.6.0, status=sent (250 2.6.0 <OJlxpz8mvtEIDJYu6fF5PPOCmKW56FXg7y324YNF8@commons.gc.cuny.edu> [InternalId=148743307397684, Hostname=wd5b.gc.cuny.edu] Queued mail for delivery)
Oct  6 09:58:24 lw2a postfix/smtp[72891]: A7FF2204CFC9: to=<MATTs_EMAIL_ADDRESS>, relay=relay.gc.cuny.edu[172.29.28.157]:25, delay=0.36, delays=0.01/0/0.01/0.34, dsn=2.6.0, status=sent (250 2.6.0 <Gd1rTbpYBjMfcjYZ1jieXHcTWkIws5PcQLCIIoyc@commons.gc.cuny.edu> [InternalId=148743307397698, Hostname=wd5b.gc.cuny.edu] Queued mail for delivery)
Oct  6 09:58:24 lw2a postfix/smtp[72886]: A715F204CFC6: to=<MATTs_EMAIL_ADDRESS>, relay=relay.gc.cuny.edu[172.29.28.156]:25, delay=0.36, delays=0.01/0/0/0.35, dsn=2.6.0, status=sent (250 2.6.0 <LAKUCGYUDj8TSPC3weydsEuCoNpwx7fs1MqjUbe6Qg@commons.gc.cuny.edu> [InternalId=148635933215235, Hostname=wd5c.gc.cuny.edu] Queued mail for delivery)

Can you confirm that those are the problematic emails? Can you also forward or attach those emails here? I just want to make sure that each email was a duplicate and not unique in some form.

We have a cron job (bpges_health_check) to check if there are pending group emails that need to be send out.

I just checked the Cavalcade log, which handles processing our jobs and it looks like the bpges_health_check job was duplicated many times over:

SELECT * FROM wp_cavalcade_jobs WHERE site = 1 AND hook = "bpges_health_check"'
+--------+------+--------------------+--------+---------------------+---------------------+----------+--------------+---------+
| id     | site | hook               | args   | start               | nextrun             | interval | schedule     | status  |
+--------+------+--------------------+--------+---------------------+---------------------+----------+--------------+---------+
| 566138 |    1 | bpges_health_check | a:0:{} | 2020-07-24 13:58:12 | 2020-07-27 14:23:12 |      300 | five_minutes | failed  |
| 566199 |    1 | bpges_health_check | a:0:{} | 2020-07-24 17:18:12 | 2020-07-27 14:23:12 |      300 | five_minutes | failed  |
| 577409 |    1 | bpges_health_check | a:0:{} | 2020-07-30 23:48:12 | 2020-07-31 03:03:12 |      300 | five_minutes | failed  |
| 579086 |    1 | bpges_health_check | a:0:{} | 2020-07-31 03:43:12 | 2020-07-31 03:43:12 |      300 | five_minutes | failed  |
| 580845 |    1 | bpges_health_check | a:0:{} | 2020-07-31 10:18:12 | 2020-08-01 03:03:12 |      300 | five_minutes | failed  |
| 580946 |    1 | bpges_health_check | a:0:{} | 2020-07-31 16:58:12 | 2020-08-01 03:03:12 |      300 | five_minutes | failed  |
| 580958 |    1 | bpges_health_check | a:0:{} | 2020-07-31 17:43:12 | 2020-08-01 03:03:12 |      300 | five_minutes | failed  |
| 583830 |    1 | bpges_health_check | a:0:{} | 2020-08-01 05:13:12 | 2020-09-02 22:18:12 |      300 | five_minutes | failed  |
| 585088 |    1 | bpges_health_check | a:0:{} | 2020-08-07 17:03:12 | 2020-09-02 22:18:12 |      300 | five_minutes | failed  |
| 585795 |    1 | bpges_health_check | a:0:{} | 2020-08-14 17:13:12 | 2020-09-02 22:18:12 |      300 | five_minutes | failed  |
| 586790 |    1 | bpges_health_check | a:0:{} | 2020-08-21 17:03:12 | 2020-09-02 22:18:12 |      300 | five_minutes | failed  |
| 589144 |    1 | bpges_health_check | a:0:{} | 2020-08-28 16:58:12 | 2020-09-02 22:18:12 |      300 | five_minutes | failed  |
| 557349 |    1 | bpges_health_check | a:0:{} | 2020-05-30 05:28:12 | 2020-06-30 11:53:12 |      300 | five_minutes | running |
| 591645 |    1 | bpges_health_check | a:0:{} | 2020-09-04 13:48:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 591696 |    1 | bpges_health_check | a:0:{} | 2020-09-04 16:48:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 593156 |    1 | bpges_health_check | a:0:{} | 2020-09-11 13:53:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 593220 |    1 | bpges_health_check | a:0:{} | 2020-09-11 16:58:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 593227 |    1 | bpges_health_check | a:0:{} | 2020-09-11 17:43:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 593336 |    1 | bpges_health_check | a:0:{} | 2020-09-12 00:23:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 593405 |    1 | bpges_health_check | a:0:{} | 2020-09-12 05:18:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 593437 |    1 | bpges_health_check | a:0:{} | 2020-09-12 08:28:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 594677 |    1 | bpges_health_check | a:0:{} | 2020-09-18 13:48:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 594717 |    1 | bpges_health_check | a:0:{} | 2020-09-18 17:18:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 594724 |    1 | bpges_health_check | a:0:{} | 2020-09-18 17:38:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 594801 |    1 | bpges_health_check | a:0:{} | 2020-09-19 00:33:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 594830 |    1 | bpges_health_check | a:0:{} | 2020-09-19 04:23:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 594838 |    1 | bpges_health_check | a:0:{} | 2020-09-19 05:53:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 615006 |    1 | bpges_health_check | a:0:{} | 2020-09-25 17:23:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 615483 |    1 | bpges_health_check | a:0:{} | 2020-09-25 17:43:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
| 622078 |    1 | bpges_health_check | a:0:{} | 2020-10-02 16:58:12 | 2020-10-06 17:08:12 |      300 | five_minutes | waiting |
+--------+------+--------------------+--------+---------------------+---------------------+----------+--------------+---------+

I've removed all the duplicate jobs and that should hopefully restore things back to normal.

Matt, can you report back when you receive another group email and whether you received duplicates or not?

Actions

Copy link

Updated by Matt Gold about 5 years ago

Sure. Do you want originals of the emails I received earlier today?

Actions

Copy link

Updated by Raymond Hoh about 5 years ago

Thanks Matt. Originals would be great!

Actions

Copy link

Updated by Boone Gorges about 5 years ago

Ray, I got a report at #13441 that BPGES emails are not coming through correctly. I checked the database and found that there are many BPGES items from the last ~24 hours that should have been sent immediately but have not been. select * from wp_bpges_queued_items where type = 'immediate'; I'm currently trying to force them to go out by triggering the health check manually via wp cron event run but they're not going. I'm investigating right now from the BPGES end, but I wanted to raise the issue here in case you may remember doing something that could cause this kind of problem.

Actions

Copy link

Updated by Boone Gorges about 5 years ago

Related to Support #13441: ongoing user problems with daily digests added

Actions

Copy link

Updated by Raymond Hoh about 5 years ago

I'm currently trying to force them to go out by triggering the health check manually via wp cron event run but they're not going.

The health check only sends out emails from the last 30 minutes:
https://github.com/cuny-academic-commons/cac/commit/9ff7d591e97a1698d49d438b95b5aa56d0f9381d#diff-86561795f3a30df22c4d2bba1d5c837b

Anything older will not get sent out.

FWIW, I did receive a group email from Colin yesterday at 9:50pm ET. Not sure what is causing the backlog. The health check should have caught these instances. Perhaps there is too much mail being sent out within 30 minutes? See also #13430.

We might need to widen the timeframe in the health check to an hour or two.

Actions

Copy link

Updated by Raymond Hoh about 5 years ago

I'm currently trying to force them to go out by triggering the health check manually via wp cron event run but they're not going

Edit - I misread your sentence. I thought you were trying to trigger the immediate emails without using the health check. You can skip what I wrote below!

Correct me if I'm wrong, Boone, but I believe immediate emails are not queued up using cron.

It uses a wp_remote_post() call:

And repeats the wp_remote_post() call until the immediate queue is done:
https://github.com/boonebgorges/buddypress-group-email-subscription/blob/383131d15a26d19b3c65f5c89c566e1e97a4b0ab/classes/class-bpges-async-request-send-queue.php#L134-L137

Actions

Copy link

Updated by Boone Gorges about 5 years ago

The health check only sends out emails from the last 30 minutes:

Not quite. It will only send emails that were sent between 24 hours and 30 minutes ago. It will NOT send anything older than a day, or anything more recent than 30 minutes.

Correct me if I'm wrong, Boone, but I believe immediate emails are not queued up using cron.

This is true. But occasionally the queue that you've described fails - timeout, etc - which leaves items in the queue that are unsent. This is what the bpges_health_check is for: every 5 minutes, it checks to see if there are items that should've gone out, but haven't.

I discovered that requests from the health check were failing in at least some cases because of some mysterious problem that was causing the self-pings (admin-ajax.php) not to complete properly when called via cron. I can't figure out what's going on here - there's no record in Apache. But it looks like I'll be able to work my way around it by eliminating the remote request, and instead processing the unsent emails directly, in the health-check request. I've got this change in place on the production site, and I'll commit it if it looks like it's working over the next few days.

Actions

Copy link

#10

Updated by Boone Gorges about 5 years ago

I've made some mods to the BPGES health check and moved it into its own plugin. I think it will now be more reliable, since it doesn't rely on HTTP self-requests. https://github.com/cuny-academic-commons/cac/commit/aaa04d13e93d2f31dca0de6f4a59e8b26cee7b41 This is deployed to the production site.

Actions

Copy link

#11

Updated by Matt Gold about 5 years ago

Hi Ray and Boone -- I'm not sure whether this is related, but in this thread, it sounds like the original poster is saying that a post by email he sent a while ago was only just now posted -- https://commons.gc.cuny.edu/groups/gis-working-group/forum/topic/having-issues-opening-gis/ . Is this something that is/will be addressed by the fixes you've made, or is this potentially a new issue?

Actions

Copy link

#12

Updated by Raymond Hoh about 5 years ago

Hi Matt, your gis-working-group report is related to #13430. I've posted my reply there.

The fixes in this ticket are not related to that. I'm thinking #13430 is an issue with how the GC mail server is configured to handle vast amounts of outgoing mail.

Actions

Copy link

#13

Updated by Boone Gorges about 5 years ago

Ray, could you please have a look at this ticket and the related ones from the last week to see where things stand? With regard to this specific ticket, I think that the changes to the BPGES health check tool go some way toward resolving the issue, but it could be that I'm missing something :)

Actions

Copy link

#14

Updated by Raymond Hoh about 5 years ago

I think the main issue in this ticket regarding multiple forum posts was resolved when I removed all the duplicate Cavalcade jobs.

I was looking at the Cavalcade issues list and wordpress.org uses a daily cron job to look for duplicate jobs and removes them if they exist: https://github.com/humanmade/Cavalcade/issues/89#issuecomment-630577767

Perhaps that is something we might want to implement as well.

Actions

Copy link

#15