Feature #3662
open
Duplicate Content/SEO/Google issues
Added by Matt Gold about 10 years ago.
Updated over 9 years ago.
Description
Hi All,
Those of you in the CAT group will have seen Bruce Rosenbloom's report on Google's changed search algorithms. It looks like they have serious consequences for the Commons and its users -- go to google, type CUNY Academic Commons, and the CAC doesn't even come up on the first page of results.
I checked in with Chris Caruso on this, and he suggested that the issue could be WordPress Multisite and duplicate content. You can see some of his thoughts on a previous SEO-related ticket here
I think that we need to try to start addressing some of these issues, as they will affect users who blog on the site or who otherwise hope to use it to raise their profiles by sharing their work on it.
Files
Here's a powerpoint that may have some useful advice around duplicate content on WP installs
Thanks, Matt. I'm not sure how much of this is duplicate content, etc - Google does not make it clear. Do you have any specific information about Google's recent algorithm changes?
I just set up Google Webmaster Tools for the Commons, and it appears that we have a fairly large amount of uncrawlable links. I'm going to go through and start correcting them where possible. I'll also do something to prevent the crawler from going through our activity pagination, which I'm guessing from the GWT report has something to do with duplicate content reports.
As for the issue of duplicate content on subsites, I'm not sure there's much we can/should do. I'd be pretty wary of, say, changing everyone's themes to remove full post content from the home page and from category/tag/author archives. We can do some more experimentation with robots.txt across these sites.
Ray, do you have any thoughts about BuddyPress-specific stuff?
Thanks so much, Boone -- this sounds like a great start.
- Target version set to 1.7.4
I've been playing with this for the last few hours, and I have an update. It turns out that Google basically stopped being able to index most pages on the Commons starting around 11/1/2014. I believe this is the direct cause of the depressed ranking in search results. GWT is telling me that it's been getting lots of 401 Forbidden errors. I've looked through every line of code that was deployed on 11/1, but I can't find anything obvious that would cause this kind of issue, so I'm going to guess that it's a coincidence that it happened on the same day as a release. Through some experimentation, I found out that disabling blocks in .htaccess that enable the HTTP authentication on cdev enabled the googlebot to reach the Commons. So, for the time being, I've commented out those lines on the production site. I'll wait a day or two to see if this kickstarts the indexing again, and if so, I'll figure out what can be done to make these changes permanent.
Boone,
I know this is a Ruby/Rails gem but we've had a lot of success using it to dynamically generate a sitemap that is sent to Google/Bing from a cron job once a week:
https://github.com/kjvarga/sitemap_generator
It lets us tell search engines what they should or should not be crawling. I assume there must be something similar for WP. I seem to remember Yoast's SEO plugin doing something like this:
https://yoast.com/wordpress/plugins/seo/
Ray, do you have any thoughts about BuddyPress-specific stuff?
All of our BP user profile pages use the same <title>
tag. (eg. USER's Profile| CUNY Academic Commons
.) This is bad for SEO. (BP 2.1 doesn't suffer from this bug; the trunk version does though.)
I'm not sure whereabouts in the CAC codebase that the USER's Profile
<title> override is occurring. Spent about 15 minutes trying to pinpoint where this is done, but couldn't find where.
I've just had a chance to read through the PDF document.
It would be easy enough to add in noindex
meta tags on all archive pages (tag, date, author pages) and paginated pages throughout the Commons. The PDF recommends leaving the traditional category pages alone.
- Assignee changed from Boone Gorges to Raymond Hoh
- Target version changed from 1.7.4 to 1.8
After much debugging, it looks like we've narrowed down the immediate problem to an incorrect server configuration, related to the recent rollout of a Varnish proxy. Google now appears to be in the process of reindexing the site, and we're back on top of our own Google search :) https://www.google.com/?gws_rd=ssl#q=cuny+academic+commons
I'm going to put this ticket into the 1.8 release so we can look further into some of the improvements Ray suggests.
I'm not sure whereabouts in the CAC codebase that the USER's Profile <title> override is occurring. Spent about 15 minutes trying to pinpoint where this is done, but couldn't find where.
Sorry about this - I think it's wp-content/themes/bp-nelo/cacap/home.php.
It would be easy enough to add in noindex meta tags on all archive pages (tag, date, author pages) and paginated pages throughout the Commons. The PDF recommends leaving the traditional category pages alone.
I don't have a problem with doing this. Ray, can you write up a patch?
- Target version changed from 1.8 to Future release
I'm removing this from the 1.8 milestone, as I think it's dependent on some BP ticket or other. Ray, do you think we should close this ticket altogether?
Also available in: Atom
PDF