On Hacking and Unpacking My (Zotero) Library

Many of my readers in the humanities already know about Zotero, the free open-source citation manager that works within Firefox and scares the hell out of Endnote’s makers. If you are a student or professor and haven’t tried Zotero, then you are missing out on an essential tool. I use it daily, both for my research and in my teaching. [Full disclosure: I am not an entirely impartial evangelist for Zotero, as its developers are colleagues at George Mason University, in the incomparable Center for History and New Media.]

The latest version of Zotero allows you to “publish” your library, so that anybody can see your collection of sources (and your notes about those sources, if you choose). In my case, I’ve not only published my library on the zotero.org site, I’ve updated the main sidebar on this very blog with a news feed of my “Recently Zoteroed” books and articles. As I gather and annotate sources for my teaching and research, the newest additions will always appear here, with links back to the full bibliographic information in the online version of my library.

How did I do this?

Why did I do this?

What follows is an attempt to answer these two questions. Before I address the how-to, though, I’ll explain the why-to: why I’m making the sources I use for my teaching and research public in the first place.

Sharing my Library in theory

Like many scholars in the humanities (I imagine), I initially had qualms about sharing my library online — checking that little box in my Zotero privacy settings that would “make all items in your library viewable by anyone.” Emphasizing the gravity of the decision, zotero.org adds this warning: “Be very sure you want to do this.”

I do want to do this, I do, I do.

But why? We are accustomed, in the humanities, to being very secretive about our research. Oh sure, we go to conferences and share not-yet-published work. But these conference papers, even if they’re finished the morning of the presentation with penciled-in edits, they’re still addressed to an audience, meant to be shared. But imagine publishing your research notes and only the notes, shorn of context or rhetoric or (especially or) the sense of a conclusion we like to build into our papers. Imagine sharing only your Works Cited. Or, imagine sharing the loosest, most chaotic collection of sources, expanded way beyond the shallows of Works Cited, past the nebulous Works Consulted, deep into the fathomless Works Out There.

Proprietary software like Endnote reinforces the notion that the engine of scholarship is competition.

A paranoid academic (and most of us are paranoid) might worry that by sharing our pre-publication sources, whether they’re primary or secondary sources, we are exposing our research before its time. My sense is that we like to keep our collection of sources private as long as possible, holding them close to our chest as if we were gamblers in the great poker game of academia. And in this game, our colleagues are not colleagues, but opponents sitting across the table from us, bluffing perhaps, or maybe holding a royal flush. Proprietary software like Endnote, which by default encloses research libraries within a walled garden, reinforces this notion, that the engine of scholarship is competition rather than collaboration.

Or, to switch metaphors, sharing our sources in advance of the final product is like sharing the blueprints to a house we haven’t yet built — a house we may not even have the money to build, and meanwhile you just know there’s somebody out there, more clever or less scrupulous or just damn faster, who can take those blueprints and erect an edifice that should have been ours while we’re still at town hall getting zoning permits. We’ve all had that experience of reading a journal article or — damn it! — a mother effing blog in which the author tackles clearly, succinctly and without pause some deep research concern that we’ve been pondering for years, waiting for it to blossom into a Beautiful Idea in our writing before going public with it. And POOF! somebody else says it first, and says it better.

Keeping our sources private is the talisman against such deadly blows to our research, akin to some superstitious taboo against revealing first names. We academics are true believers in occult knowledge.

To put it in the starkest terms possible: before I published my library I was concerned that someone might take a look at my sources and somehow reverse engineer my research.

Let’s face it, I’m an English professor. It’s not as if I’m working on the Manhattan Project.

Are we in the humanities really that ridiculous and self-important? Let’s face it, I’m an English professor. It’s not as if I’m working on the Manhattan Project. My teaching and research adds only infinitesimally incrementally to the storehouse of human knowledge. I don’t mean to belittle what scholars in the humanities do à la Mark Bauerlein. On the contrary, I think that what we do — striving to understand human experience in a chaotic world — is so crucial that we need to share what we learn, every step along the way. Only then do all the lonely hours we spend tracing sources, reading, and writing make sense.

Looked at prosaically, public Zotero libraries may be the equivalent of a give-a-penny, take-a-penny bowl at a local store. This convenience alone would be useful, but the creators of Zotero are much more inspired than that. They know that sharing a library is crowdsourcing a library. The more people who know what we’re researching before we’re done with the research, the better. Better for the researchers, better for the research. Collaboration begins at the source, literally. And as more researchers share their libraries, we’re going to achieve what the visionaries in the Center for History and New Media call the Zotero Commons, a collective, networked repository of shareable, annotatable material that will facilitate collaboration and the discovery of hidden connections across disciplines, fields, genres, and periods.

And that is why I’m sharing my library.

Sharing my Library in Practice

Now, how am I sharing it? I’ve taken what seems to be an unnecessarily complicated route in order to incorporate my library into my blog. There is an easy way to do what I’ve done: Zotero has native RSS feeds for users’ collections, and all you need is to subscribe to that feed using a widget on your blog. In my case I could have used the default Wordpress RSS sidebar widget. But I didn’t. I wound up working with both Dapper and Yahoo Pipes, and here’s why.

I didn’t like how the RSS feed built into zotero.org included everything I added, including duplicate citations, snapshots that I later categorized as something else, and PDFs unattached to metadata (even if I retrieved that metadata later). In short, the default RSS stream looked messy in Wordpress (but it looks great in Google Reader).

The online mash-up tool Yahoo Pipes is perfect for combining and filtering RSS feeds and that’s what I wanted to use. I can’t program my way out of a paper bag, but Pipes is simple enough that even I can use it. So why did I also use Dapper, another online tool that lets you do fun things with RSS feeds? Because Pipes for some reason would not accept the Zotero RSS feed as valid. I haven’t been able to confirm this, but I’m guessing it has something to do with Zotero’s API using a secure HTTPS rather than HTTP. Or maybe it’s because the Zotero feed is actually XML rather than RSS. Again, I’m not a programmer and I’m just fumbling my way around this hack. In any case I ran my Zotero feed through the Dapp Factory, which did accept it.

Next I dumped the Dapper feed into Yahoo Pipes, using several of Pipe’s operators to filter duplicates and attachment file names that were cluttering the RSS feed. Here’s is a map of my Pipe.

Using Yahoo Pipes to filter a Zotero library

Using Yahoo Pipes to filter a Zotero library

It’s quite simple, and with some experimentation I may improve my hack (for example, I’m toying with Feedburner as a substitute for Dapper, which may preserve more of the original XML, giving Pipes more raw data to manipulate and mash). But even right now in its kludged form, the result is exactly what I set out to do.

In addition to its simplicity, one of the advantages of Yahoo Pipes is the variety of output formats available. For my blog’s sidebar I have Pipes generate an RSS feed, but I could just as easily create an interactive Flash “badge” with it:

I find the possibilities of a portable, embeddable version of my Zotero library extremely evocative. It’s a kind of artifact from the future that our methodological and pedagogical approaches haven’t caught up with yet. Here is where the theory and practice of a collaborative library have yet to meet — and I want to end my manifesto/guide with a simple appeal: let’s begin thinking about the untapped power of this intersection and what we can do with it, for ourselves, our students, and our scholarship.

  1. none’s avatar

    What beautiful irony.  The first couple links I clicked in your sample feed went to a paywall that stopped me from accessing the articles without a JHU ID number.

    Reply

  2. none’s avatar

    What beautiful irony.  The first couple links I clicked in your sample feed went to a paywall that stopped me from accessing the articles without a JHU ID number.

    Originally posted on SAMPLE REALITY

    Reply

  3. Nate Kogan’s avatar

    Thanks for the walk-through. I’ve had trouble with directly feeding the RSS from Zotero into my Wordpress.com blog, but your comment about it being XML makes some sense. I’ll give this Yahoo Pipes work-around a shot. Thanks again!

    Reply

  4. Nate Kogan’s avatar

    Thanks for the walk-through. I’ve had trouble with directly feeding the RSS from Zotero into my Wordpress.com blog, but your comment about it being XML makes some sense. I’ll give this Yahoo Pipes work-around a shot. Thanks again!

    Originally posted on SAMPLE REALITY

    Reply

  5. Joe H’s avatar

    Great post Marc. I love how this asserts a whole different way of being in the scholarly community. I’m going to give Zotero a try.

    Reply

  6. Joe H’s avatar

    Great post Marc. I love how this asserts a whole different way of being in the scholarly community. I’m going to give Zotero a try.

    Originally posted on SAMPLE REALITY

    Reply

  7. Jared C.’s avatar

    This is pretty stellar stuff. And, yes, it is impossible to find those evaluations online. I have tried before…

    Originally posted on SAMPLE REALITY

    Reply

  8. Steve’s avatar

    It’s good to see people from the humanities eager about new technologies.  Thanks for the entertaining read, and here’s to hoping syndication technologies like this results in positive pedagogical shifts and a more open stance towards knowledge.

    Originally posted on SAMPLE REALITY

    Reply

  9. Steve’s avatar

    It’s good to see people from the humanities eager about new technologies.  Thanks for the entertaining read, and here’s to hoping syndication technologies like this results in positive pedagogical shifts and a more open stance towards knowledge.

    Reply

  10. AramZS’s avatar

    This is very admirable. I agree, Mason’s system of ratings lacks the transparency and accessibility to be used in any meaningful way by students. RateMyProfessor can be good for finding only the extremely good and bad professors, and sometimes not event that.  On the remix front, I’d love to see some more graphic visual representations of the statistics, which is really what Mason should be doing to begin with.
    I hope many more Mason professors follow your example!

    Originally posted on SAMPLE REALITY

    Reply

  11. Bill’s avatar

    Thanks for sharing this. I just tried to do the same thing and have been able to create my own zotero feed. Unfortunately, the filter does not seem to be working as your has and I am still getting .pdfs and duplicates. Ill keep plugging away and see what I can come up with. Thanks, again!

    Reply

  12. Chris Rusbridge’s avatar

    I’ve been worrying a bit about aspects of this also. The first thing to remember is that there is a well developed repository movement; if you are careful about the rights you grant to publishers, and read the small print, you can deposit a version of your paper in your local repository for free access (sometimes after an embargo period). Then, on making the repository more useful, see the series of posts starting at http://digitalcuration.blogspot.com/2008/07/negative-click-positive-value-research.html, and ending with http://digitalcuration.blogspot.com/2008/08/comments-on-negative-click-research.html. On whether source code repositories could be useful for these purposes, http://digitalcuration.blogspot.com/2009/04/libraries-of-future-sourceforge-as.html reports some interesting discussions. But if you make progress on your “source control and backup system for manuscripts”, I’d love to hear…

    Originally posted on darcusblog

    Reply

  13. darcusb’s avatar

    Thanks Chris. Good to see you trying to push repositories forward, as I really haven’t found that model very compelling.

    When I get around to it, for example, my publications will be hosted on my own site, as easily accessible XHTML (I did have code that integrated commenting on the articles via disqus, but had to pull it because of bugs in their software w/XHTML).

    On the SCM, I’m just using darcs, where I keep a sort of master repository on my main site, and then keep local copies wherever I work. The distributed nature of darcs means I have effectively full backups on every machine I am working on. I may open this up to web viewing later, but am not going to worry about it ATM.

    As you may guess from the above, I think SF is a horrible model for this. But then I don’t much like the idea of centralization to begin with. It seems preferable to be able to simply copy/fork a distributed repository to a central local if/when the need arises. Of course, for this to be valuable to people other than technically-savvy geeks, you’d have to put some work into a nice UI; maybe something along the lines of this.

    Originally posted on darcusblog

    Reply

  14. zach whalen’s avatar

    Hi Mark and everyone else in this thread. I thought you all might be interested in my take on this, using Drupal: http://www.zachwhalen.net/blog/09/aug/recently-zoteroed-drupal-approach

    I don’t mean to suggest that using Pipes is a bad idea; it’s just that what you’re using it for presented a good opportunity for me to demonstrate some similar functionality just using Drupal and its all-powerful Views module.

    Reply

    1. Mark Sample’s avatar

      Zach, very cool! I’m going to have to get down and dirty with Drupal at some point in the near future. In the meantime, I think I’m going to subscribe to your Zotero feed!

      Reply

  15. Ashley B.’s avatar

    I am thrilled by the possibility of sharing libraries and citations online– while someone can’t reverse engineer your research isn’t it great to have ALL your background reading there informing people who read your work, are interested in the topic, etc?

    I’d love to make my Zotero library part of my own website, not as a feed but as a dynamic interface. I know that this isn’t impossible, but it does seem like it’s going to require a bit more work and getting down and dirty with some code… Any tips would be very much appreciated!

    Reply