Project

General

Profile

Bug #8944

Hypothesis Plugin not working on media library PDFs

Added by Laurie Hurson almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority name:
Normal
Assignee:
-
Category name:
Teaching
Target version:
Start date:
2017-11-28
Due date:
% Done:

0%

Estimated time:

Description

The hypothesis plugin (https://wordpress.org/plugins/hypothesis/) allows Commons users to annotate pages, posts, and PDFs on the commons site. Several faculty members who will be using the Commons for teaching next spring have expressed interest in using this plugin to annotate pdfs on their course sites.

On each page or post, a menu should/will pop out to allow for highlights and annotations. In the backend, admins can select to have the plugin run on PDFs in the media library. However, it appears that this feature does not work on PDFs uploaded to a Commons site's media library. I have been testing this on https://coursetest.commons.gc.cuny.edu/

I am not sure if this is a commons firewall issue but the plugin doesn't seem to work for PDFs on public or private sites.

hypoth error.png (173 KB) hypoth error.png Laurie Hurson, 2017-12-18 02:59 PM

Related issues

Related to CUNY Academic Commons - Feature #6332: Allow uploaded files to be marked as private in an ad hoc wayNew2016-10-17

Related to CUNY Academic Commons - Support #10986: PDF embedder provoking errorResolved2019-01-22

History

#1 Updated by Boone Gorges almost 4 years ago

  • Status changed from New to Staged for Production Release
  • Target version set to 1.12.3

Hi Laurie - Thanks for the report.

If I'm understanding your report properly, then the issue is the same as what I described in a bug report to the Hypothesis repo a couple months ago: https://github.com/hypothesis/wp-hypothesis/pull/27. That PR has been merged into the plugin, but the plugin hasn't since seen a new release. So I've merged the latest dev version of the plugin into our codebase, and will release it as part of the release on the 1st. Once that's in place, links like this one https://coursetest.commons.gc.cuny.edu/?p=83&preview=true will be run through Hypothesis.

https://github.com/cuny-academic-commons/cac/commit/f5db68b9b6625fc1e35c23e15b1684a16ae30799

#2 Updated by Boone Gorges almost 4 years ago

  • Status changed from Staged for Production Release to Resolved

This will be deployed to the production site within the hour, so I'm closing it out.

#3 Updated by Laurie Hurson almost 4 years ago

Hi Boone,

Circling back to this. Looks like the plugin is now working on PDFs for "public" sites on commons. However, if the site is private (i.e. only available to commons users) when you click through to the PDF you get the following error message:

"PDF.js v1.1.24 (build: f6a8110) Message: Unexpected server response (403) while retrieving PDF "https://via.hypothes.is/id_/https://oerfacultyfellows.commons.gc.cuny.edu/files/2017/12/2015-Open-Internet-Order.pdf"."

Screenshot attached

I have been testing on this site here (now public): https://oerfacultyfellows.commons.gc.cuny.edu

It seems like the commons firewall might be barring access to the pdf bc of the sites privacy settings? Is there any way to amend the plugin to fix this error?

Thanks for your help with this.

Laurie

#4 Updated by Boone Gorges almost 4 years ago

  • Related to Feature #6332: Allow uploaded files to be marked as private in an ad hoc way added

#5 Updated by Boone Gorges almost 4 years ago

Hi Laurie -

Thanks for the follow-up.

Items uploaded to Commons sites - images, PDFs, etc - have the same privacy restrictions as the sites themselves. If a site is visible only to logged-in Commons users, then its uploads will require that users be logged into the Commons as well. The Hypothesis server is not, of course, logged into the Commons, which is why it's unable to access the PDF for the purposes of whatever Hypothesis does.

In theory, it might be possible to modify our file-privacy protection to allow the Hypothesis server to access PDFs and perhaps other kinds of content, regardless of a site's settings. But this is complicated. For one thing, it's the kind of thing you'd want to allow admins to disable - and it should probably be disabled by default - since many/most instances of private sites are private either because the content is genuinely sensitive, or because it's subject to copyright restrictions that make it illegal to share with a third-party like Hypothesis. So this means building some interface with toggles that allow admins to configure, on a per-file or per-site basis, the file protections. See #6322.

Perhaps more importantly, the various techniques we might use for allowing Hypothesis access - IP range whitelisting, checking the Referer header, etc - are brittle and/or subject to being exploited by unauthorized third parties.

As such, I think that for now the only answer is that PDFs that must be scannable by Hypothesis must be stored on non-private sites. In the future, if this becomes a widespread problem or a frequent request, we can revisit whether it's worth the necessary effort and risk that go into building a more robust system.

#6 Updated by Laurie Hurson almost 4 years ago

Hi Boone,

Thanks for looking into this and for your explanation of issues allowing hypothesis server access on the commons and private sites. I will advise faculty on these issues and suggest they use a public site if they are planning to use the hypothesis plugin.

Thanks again & happy new year

Boone Gorges wrote:

Hi Laurie -

Thanks for the follow-up.

Items uploaded to Commons sites - images, PDFs, etc - have the same privacy restrictions as the sites themselves. If a site is visible only to logged-in Commons users, then its uploads will require that users be logged into the Commons as well. The Hypothesis server is not, of course, logged into the Commons, which is why it's unable to access the PDF for the purposes of whatever Hypothesis does.

In theory, it might be possible to modify our file-privacy protection to allow the Hypothesis server to access PDFs and perhaps other kinds of content, regardless of a site's settings. But this is complicated. For one thing, it's the kind of thing you'd want to allow admins to disable - and it should probably be disabled by default - since many/most instances of private sites are private either because the content is genuinely sensitive, or because it's subject to copyright restrictions that make it illegal to share with a third-party like Hypothesis. So this means building some interface with toggles that allow admins to configure, on a per-file or per-site basis, the file protections. See #6322.

Perhaps more importantly, the various techniques we might use for allowing Hypothesis access - IP range whitelisting, checking the Referer header, etc - are brittle and/or subject to being exploited by unauthorized third parties.

As such, I think that for now the only answer is that PDFs that must be scannable by Hypothesis must be stored on non-private sites. In the future, if this becomes a widespread problem or a frequent request, we can revisit whether it's worth the necessary effort and risk that go into building a more robust system.

#7 Updated by Boone Gorges over 2 years ago

Also available in: Atom PDF