Project

General

Profile

Bug #11879

Hypothesis comments appearing on multiple, different pdfs across blogs

Added by Laurie Hurson almost 2 years ago. Updated almost 2 years ago.

Status:
New
Priority name:
Normal
Assignee:
Category name:
-
Target version:
Start date:
2019-09-19
Due date:
% Done:

0%

Estimated time:

Description

Hi All,

A professor has reported a very weird hypothesis issue. I am not sure if this is a Commons bug or hypothesis bug.

This professor uploaded a PDF (book chapters 5-7) for a course in 2017 link here: https://via.hypothes.is/https://spa114fall171.commons.gc.cuny.edu/wp-content/blogs.dir/3395/files/2017/09/La-Frontera-5-6-7.pdf

This semester, the professor will be using the same reading (only chapter 5) so she uploaded a shortened version to the media library for her new course: https://via.hypothes.is/https://span2204.commons.gc.cuny.edu/wp-content/blogs.dir/8302/files/2019/09/Chapter-5.pdf

*to view the readings page on the Span2204 site the pw is Fall2019

The problem: the new chapter 5 pdf pulls in the old comments from the original chapters 5-7 pdf. The pdfs are on different commons sites in different media libraries. It appears that both pdfs are stored in WP-content blogs directory but seem to have different file numbers and names.

I dont think this is a plugin issue since she was using the via hypothesis link, not the plugin to add the annotation layer on these pdfs. Moreover, when I installed the plugin on the Span2204 site, the pdfs cannot be read because the via hypothesis link is added to the pdf url twice.

History

#1 Updated by Laurie Hurson almost 2 years ago

I also recreated the issue here so it may not be user dependent (if that was a possibility...)

https://via.hypothes.is/https://classtestbmcc.commons.gc.cuny.edu/files/2019/09/Chapter-51.pdf

#2 Updated by Boone Gorges almost 2 years ago

  • Assignee set to Laurie Hurson
  • Target version set to Not tracked

This has to be an issue on the Hypothesis site. They probably have a mechanism for detecting duplicates and combining them, a mechanism that does some fuzzy URL matching that these Commons sites have triggered. I'd suggest you reach out to their support team (I'd do it myself but I don't understand the intended behavior very well). Let me know if they need any technical information.

#3 Updated by Laurie Hurson almost 2 years ago

Thanks Boone. I figured this might be a hypothesis issue, thanks for confirming. I will reach out them and circle back with any relevant info.

#4 Updated by Laurie Hurson almost 2 years ago

The hypothesis issue was document "fingerprinting". If docs have the same fingerprint the annotations will appear across all docs with that fingerprint. I am working with the professor who experienced this issue to create new doc fingerprints (Solution #2 below). This will be helpful to know in the future as more faculty may use the same pdf across multiple courses.

More below from the Hypothesis folks.

--

Every PDF has what's called a "fingerprint", and Hypothesis uses this fingerprint to tell PDFs apart. This means that if I emailed you a PDF and we both saved different copies to our hard drives, we'd still see each other's annotations in our copy of the PDF.

Looking at both the longer and shorter PDFs you sent me, it seems that they both use the same fingerprint, so Hypothesis treats them as the same document. You can read how to check PDF fingerprints here. https://web.hypothes.is/help/how-i-do-i-check-and-make-sure-my-pdfs-have-different-fingerprints/

I have two solutions that I think will help: using Hypothesis groups and making new PDF with a different fingerprint. I'll walk you through both, but of the two, the former is the better option.

1) Using groups
The easiest way to use groups would be to install Hypothesis' LMS app at your institution. Every course will get its own group, and the "Public" group option disappears, so students won't mistakenly put their annotations into the wrong group.

Outside of the LMS app, your professor can also create a private group for their class and then invite their students into it. They can make as many groups as they like; the best practice would probably be to make one for each class separately (so Fall '19, Spring '20, etc.). When students visit the PDF using the links in your email, they need to make sure they post in the appropriate group.

2) Change the PDF fingerprint when creating a new copy
Here's an article on changing a PDFs fingerprint. Note that this is a workaround; users have reported that this seems to work once or twice, and then subsequent PDFs will no longer get the new fingerprints. I have been able to create as many new fingerprints as I need using Preview on a Mac (instead of Google Chrome, as directed by the linked article); I have yet to find a consistent solution for Windows.
https://web.hypothes.is/help/how-to-save-copy-of-pdf-with-a-different-fingerprint/

#5 Updated by Boone Gorges almost 2 years ago

Thanks for sharing this, Laurie! Good to know about the fingerprinting.

Private groups and LMS integration won't work for the Commons at this time, as I think our recent call with them made clear. Might be something we pursue in the future.

For the time being, once we've determined which technique works reliably for re-fingerprinting PDFs, we may want to add it to any Hypothesis documentation we might maintain.

Also available in: Atom PDF