Bug #1362
closedGoogle Malware Warning
Added by Matt Gold about 13 years ago. Updated almost 13 years ago.
0%
Description
Just doing a bit of googling and saw a link to the Commons in the results with a google-appened warning that read "This site may be compromised." Screenshot attached. Curiously, other Commons results didn't have the same warning under the links.
Clicking the "compromised" link leads to this page
Not sure whether this is worth pursuing, but thought I'd report it to see what you thought.
Files
Screen_Shot_2011-11-17_at_9.52.20_PM.png (35.1 KB) Screen_Shot_2011-11-17_at_9.52.20_PM.png | Matt Gold, 2011-11-17 09:55 PM | ||
Web_crawl_errors_HTTP_commons_gc_cuny_edu_20111121T132831Z.csv (1.17 KB) Web_crawl_errors_HTTP_commons_gc_cuny_edu_20111121T132831Z.csv | Crawl errors | local admin, 2011-11-21 08:55 AM | |
commons.gc.cuny.edu_Content_Analysis_Duplicate_title_tags_20111121T134457Z.csv (196 KB) commons.gc.cuny.edu_Content_Analysis_Duplicate_title_tags_20111121T134457Z.csv | Duplicate titles | local admin, 2011-11-21 08:55 AM |
Updated by local admin about 13 years ago
I see what the issue is. I used google's "safe browsing" tool to get more info it says there's no issues found on the Common's itself:
http://www.google.com/safebrowsing/diagnostic?site=commons.gc.cuny.edu
Diagnostic page for commons.gc.cuny.edu What is the current listing status for commons.gc.cuny.edu? This site is not currently listed as suspicious. What happened when Google visited this site? Of the 7 pages we tested on the site over the past 90 days, 0 page(s) resulted in malicious software being downloaded and installed without user consent. The last time Google visited this site was on 2011-11-17, and suspicious content was never found on this site within the past 90 days. This site was hosted on 1 network(s) including AS31822 (CITY). Has this site acted as an intermediary resulting in further distribution of malware? Over the past 90 days, commons.gc.cuny.edu did not appear to function as an intermediary for the infection of any sites. Has this site hosted malware? No, this site has not hosted malicious software over the past 90 days.
But we're getting flagged for being part of CUNY !
Diagnostic page for AS31822 (CITY) What happened when Google visited sites hosted on this network? Of the 24 site(s) we tested on this network over the past 90 days, 1 site(s), including, for example, cuny.edu/, served content that resulted in malicious software being downloaded and installed without user consent. The last time Google tested a site on this network was on 2011-11-17, and the last time suspicious content was found was on 2011-11-17. Has this network hosted sites acting as intermediaries for further malware distribution? Over the past 90 days, this network has not hosted any sites that appeared to function as intermediaries for the infection of any other sites. Has this network hosted sites that have distributed malware? No, this network has not hosted any sites that have distributed malicious software over the past 90 days.
Not sure that much can be done about this. But hey, I'll see if there's anything we can do to help out.
Updated by local admin about 13 years ago
Let me sign up for google's "webmaster tools" thing and see if there's any more information... I'll have to upload a file to the site to verify I'm an admin.
Updated by local admin about 13 years ago
I have been scouring the commons website for anything whatsoever that could be out of order in this regard.
Here's something potential I've found. Are anonymous comments allowed?
I've found a bunch of exploitative SEO links on comments: e.g. http://apicciano.commons.gc.cuny.edu/2010/05/11/furloughs-coming-to-cuny/
These may conceivably be considered harmful.
Updated by Matt Gold about 13 years ago
Hi André,
I'll remove those comments, but spam comments are NOT malware and should not be flagged as such by google.
Can you raise this issue with Central CUNY? at our subcommittee meeting, Mikhail and Joe said that a related warning had gone out last week that they had received and that the Commons wasn't mentioned as an infected site. I think that this warning is in error, as you've said. Joe said that the last time it happened to Macaulay, you helped get in touch with Google to ask them to rescan and clear the MHC website.
Can you do the same for the Commons? I've just had someone from the Modern Language Association contact me to say that it looked like our site had been hacked . ... . I'd appreciate any help you could give us here.
Updated by local admin about 13 years ago
Notes
Hey Matt,
So, I never said comment spam was malware so I'm not sure where your taking that from, but in fact google never reported the commons website as having anything to do with malware.
The notice for malware is actually a completely different one (which is what macaulay got back then), while this in a new warning that the google came up with this year in their quest for great good and universal peace. It it's supposed to warn about sites that may have been modified without consent somehow.
I've reported the issue to the Graduate Center's senior-most security officer, and we're evaluating the situation internally.
Also I actually did submit a website re-check form already, but I'm notified that those can take "weeks" to be evaluated ...sigh. At least they admit that their system "may not be perfect"...
While I do believe the warning may have been issued in error, I'm not assuming this is the case. So the commons team should help double-check: look for and delete any comment spam found live, search for spam strings on the site (viagra, etc), and just keep an eye out for anything spammy-looking even if subtle.
What's this report you mention discussing at the meeting, by the way?
Updated by local admin about 13 years ago
- Priority name changed from Normal to High
Yeah, it seems like if not the problem here, unchecked comment spam is definitely an issue at hand:
Run a google search on:
site:commons.gc.cuny.edu (viagra|levitra)
.. and see what happens!
the issue here is that unwitting visitors may click on these links and be directed to sites that contain malware or other badness.
I strongly recommend we address this issue ASAP.
Updated by local admin about 13 years ago
It's pretty bad...
site:commons.gc.cuny.edu (ejaculation)
Updated by local admin about 13 years ago
Guys-- let's please try and clean up the spam urgently, because while the site recheck on google "may take weeks", I'm concerned it may take place very soon (Murphy's Law) and we'll blow our chance at regaining our upstanding netizen reputation.
having had a chance to look into it more deeply, it does seem that this is actually the root issue withe warning, rather than the CUNY central issue, as I first suspected.
Updated by Boone Gorges about 13 years ago
Hi André. Thanks for your vigilance.
I did a bit of investigation myself. It turns out that the issue turned up by your Google searches is caused by very small number of spam comments, plus a series of unlikely glitches in our system.
- First, a small number of spam comments were successfully posted to blogs on the site. (I only found two distinct comments, though there may be more.)
- By default, BuddyPress does not keep track of comments that are left by non-logged-in visitors (as is the case with these spam comments). However, we run a plugin that includes these comments in our activity stream. As a result, the spam comments on the blogs have parallel BP activity items.
- Normally, such a problem would be isolated to a single place - namely, the sitewide activity feed (since there's no member page where the spam activity item would appear). However, there is a piece of legacy code in BuddyPress that improperly interprets the afilter URL parameter in such a way that, when you pass the afilter param, all activity items are shown - even when you're looking at a single user's page. I just opened a BP ticket for this issue. http://buddypress.trac.wordpress.org/ticket/3754
For now, here's what I've done:
1) I removed the code that made afilter work. That means that next time Google indexes, they may find a single instance of a given spam comment, but they won't find a zillion of each one like they do now.
2) I deleted the two activity items that were causing the issues. I did a bunch of direct queries to the database to check for offending items, and these were the only two I found.
This should be sufficient to take care of the problem when Google reindexes us. Even if I've missed a comment or two, the fix in (1) will mean that the problems will be isolated instead of widespread from the point of view of Google's bots. I'll work with the BP team to talk about the problem that underlies (1).
André, does this sound like it covers all the bases? I suppose all we can do is wait to be reindexed?
Updated by local admin about 13 years ago
Makes perfect sense, Boone.
The keywords that got me hits so far were levitra
, penis
, ejaculation
and payday
(a straight flush for a perfect weekend? ;)
This list put me on the right track, but I haven't checked all possibilities yet.
Another angle very much worth looking into is comments like the ones found on the blog post I posted above. You will noticed that they don't contain any "suspicious words" (Orwell, anyone? I love my job), but are clearly exploitative and may reflect negatively on our profile. They will be harder to track down, but should definitely be eliminated as well one way or another.
Should we consider prohibiting anonymous, non-vetted comments? Like, say, requiring commenters to authenticate with a trusted third-party ID service?
Updated by Boone Gorges about 13 years ago
(a straight flush for a perfect weekend? ;)
Maybe in my single days :)
I deleted two more activity items on your 'payday' tip. Thanks for that.
Agreed that we need to get rid of the less obvious spam, though I'm not quite sure how to do that, or what kind of policy to establish. I'd like Matt to weigh in before I go messing around in the comments section of other people's blogs.
Should we consider prohibiting anonymous, non-vetted comments? Like, say, requiring commenters to authenticate with a trusted third-party ID service?
I think that requiring third-party authentication will set the bar too high for participation. However, we could consider activating some additional spam protection across the network - maybe a captcha, or some sort of honeypot, that would at least cut down on the problem. We could also force the WP setting that says that the admin must manually approve at least one comment from a given email address before future comments can appear unvetted, though to be honest, I don't want to put too much responsibility on the bloggers, because some of them are not very good at recognizing spam when they see it, so they just approve it.
Updated by Matt Gold about 13 years ago
Thanks to both of you for your work on this. For my part, I went through Tony P's blog comments and marked all spam comments as spam. So, if/when Google reindexes us, I'm hoping that this issue will be solved.
Updated by local admin about 13 years ago
- File Web_crawl_errors_HTTP_commons_gc_cuny_edu_20111121T132831Z.csv Web_crawl_errors_HTTP_commons_gc_cuny_edu_20111121T132831Z.csv added
- File commons.gc.cuny.edu_Content_Analysis_Duplicate_title_tags_20111121T134457Z.csv commons.gc.cuny.edu_Content_Analysis_Duplicate_title_tags_20111121T134457Z.csv added
For my part, I went through Tony P's blog comments and marked all spam comments as spam.
That's great. How were you able to tell them apart, just common sense? I wonder if there's a way for us to script or automate this somehow in case other blogs are similarly afflicted.
So, if/when Google reindexes us
I'm pretty sure it's when on this one. Here's the actual message I received in for this:
We've received a request from a site owner to reconsider how we index the following site: http://commons.gc.cuny.edu/ We'll review the site. If we find that it's no longer in violation of our Webmaster Guidelines, we'll reconsider our indexing of the site. Please allow several weeks for the reconsideration request. We do review all requests, but unfortunately we can't reply individually to each request.
I'm hoping that this issue will be solved.
The afilter
fix seems to be helping. Now crawling those pages returns error 403 rather than index the bad comments. (.csv file attached).
The bot is reporting some issue with "duplicate title tags" but I suspect those may just be a side-effect of the same issue, but please take a look at the attached report for that. There's ways to more narrowly tailor how the bot parses URL parameters, so we may want to explore that. What do you think, Boone?
Thanks to both of you for your work on this.
No problem. Glad to help.
I'm happy to continue coordinating security issues like this for the Commons, but I'd like to get credited on the website with something like "Security Badass Ninja" ;)
Updated by Matt Gold about 13 years ago
Yeah, just common sense. I'm going to write to Tony (again) to try and help him diagnose such spam on his own.
One good thing to look out for: I just bought an institutional license to Akismet (http://akismet.com/), Automattic's spam-prevention program. The great thing is that I worked things out with the company so that the license is good for ALL WordPress installations on CUNY servers. So, I'm hoping that it will not only cut down on spam on our site, but on sister installations, too.
I'd like to get credited on the website with something like "Security Badass Ninja"
You got it!
Updated by local admin almost 13 years ago
- Status changed from Assigned to Reporter Feedback
Looks like we're clean! http://www.google.com/search?q=commons.gc.cuny.edu
Updated by Boone Gorges almost 13 years ago
Whee! Thanks for staying on top of it, André. Matt, close this ticket if you're satisfied.
Let's be sure to continue the discussion about individual blog spam prevention.
Updated by Matt Gold almost 13 years ago
- Status changed from Reporter Feedback to Resolved
That's fantastic news! Thanks so much for all of your work on this, André, and thanks to you too, Boone, for your work in eradicating spam from the site. Excellent!!