Tips For Internet-based Scholarship

I write painfully little these days, but I save huge amounts of online references for future reference–I'm an Internet equivalent of a paranoid backwoods survivalist, except that I'm compulsively hoarding stuff for academic articles one day (I keep telling myself) as opposed to preparing for race war or some pulp fiction Armageddon.

Here's how I archive online references:

I use the free Evernote and its Firefox plug-in for saving individual quotes from articles I'm reading (along with tags to organize them). It automatically saves the text you've highlighted along with the source link in your local database (which can be synced on other PCs).
I save the URLs of useful pages and resources in Delicious (formerly known by the annoying name and URL del.icio.us), which also integrates into Firefox (and Internet Explorer, if you must subject yourself to its mediocrity). I tag these by category, as well.
Many of you are probably already doing these 2 things, but are you protecting your records against the scourge of Link Rot (i.e., the phemonenon of pages being deleted or moved, which makes your bookmarks useless)?
Saving a copy locally is a partial solution–and is probably a good idea for key references in any case–but you can also archive it online. Archive the page for free using WebCitation.org. For example, I archived this Economist column on the Dutch far-right demagogue Geert Wilders at http://www.webcitation.org/5teauEzoh. Then save the citation URL in the comments field of your record in Delicious. (Note: You can add a link to your Bookmarks Toolbar that allows you to archive the page you're on with a click.)
I also use WorldCat and RefWorks a lot, but I won't get into those here. (The latter isn't free, but if you're a student your school may have a license that gives you free access.)

There's no guarantee WebCitation.org won't eventually go out of business, but that's even true of potentially major corporations (remember the once iconic free host GeoCities). It's an extra layer of protection, and in the event you want to cite it in a publication but discover it's disappeared from the web you'll be very grateful you took a moment to archive it. My guess is that this site (or one like it) will eventually get industry support–as has the WayBack Machine–given what an important need it meets.

And, needless to say, back stuff up. Now. When it comes to free services, one policy change or database failure and the data you've stored could be gone.

Btw, another advantage of WebCitation.org is that it saves a copy of the page that its author can't scrub after the fact. That's very handy when documenting the rants of nutjobs and hate-mongers in the Blogosphere.

Update:

Evernote & Linux. Last I checked, Evernote is only available for PC and Mac. It's one of the few indispensable programs that doesn't have a Linux version or analogue (in fact, I've seen Linux enthusiasts mention it as the only reason one might not ditch Windows entirely). (As a consolation prize, a Linux user can save links using via a stripped-down web clip feature on the Evernote website, but that's painfully awkward compared to the near-seamless browser integration available for Evernote on PC or Mac.)

Bulk imports in WebCite. I haven't explored it yet, but a trulysublime feature of WebCitation.org is "combing". You can upload a document, extract a list of all its links, and then choose those you wish to archive. that it contains. So, you could maintain a list of links on a subject as you do research and then submit them all in one fell swoop (as opposed to archiving each page individually). So, if you have a bibiography that contains some links, all you have to do is upload it and click a few checkboxes. A huge time-saver.

As if that weren't neat enough, you can even comb web pages. Plug in a URL to an article that links to a lot of places and it'll present you an exhausting list of all links with checkboxes for archiving each.

Downloading files in bulk. While we're at it, if you find yourself downloading large volumes of files, the Firefox extension DownloadThemAll is also extremely helpful. And you can fine-tune downloading in all sorts of ways (e.g., only PDFs, or only files containing "2010"). When dealing with large files it's helpful in another way, as you can pause downloads (an unreliable option in the built-in Firefox download feature).

And if you have a huge amount of links to download and are really clever, you could save them all in a big MS Word, save it as HTML, open the file in Firefox (using File–>Open–>etc.), and then download them all with a click or two.

Scripting the download of a huge number of sequentially-named files
Another scenario: Let's you have 500 links you want to download whose filenames are identical except for a number. For example:

www.domain.com/article11.pdf
www.domain.com/article12.pdf
www.domain.com/article100.pdf
www.domain.com/article501.pdf
www.domain.com/article999.pdf
…

You could, God forbid, do that manually, finding each link and downloading it (hopefully in one step–i.e., rightclicking and choosing "Start saving link with DTA oneclick" instead of opening it in Adobe Reader, waiting for it to load and then choosing File–>Save a copy).

You could certainly improve the process by removing all the mousing around using the Word-to-HTML process laid out above, but that would still involve a lot laborious and error-prone typing.

Your life would be much easier if you could just tell DTA to try to save everything between www.domain.com/article1.pdf and www.domain.com/article1000.pdf and let it do the work for you. Well, you can!

The most efficient process would be to create those links using a macro to increment the number in Notepad++–a free program which, if you didn't know, is infinitely superior to the Windows default text editor, Notepad (and which can be easily set up to run in its redneck cousin's place)–and then paste them into Word but that would require explaining how to use its macro feature and depend on a program that most people don't have installed (it's free, but you need admin rights on your PC to install).

So I'll give a Excel solution instead.

In Excel, create a 3-column spreadsheet like :
www.domain.com/article 1 .pdf
www.domain.com/article 2 .pdf
Select all 6 cells and, keeping them selecrted, drag the little box in the lower-right corner down until you reach the desired top range (e.g., 10000).
www.domain.com/article    1    .pdf
www.domain.com/article    2    .pdf
www.domain.com/article    3    .pdf
…
www.domain.com/article    9999    .pdf
www.domain.com/article    10000 .pdf
Now, select those 3 columns and copy them into blank Word file as unformatted text (Edit–>Paste Special–>Unformatted Text).
Use the enhanced replace function to remove a tab characters from the Word file (Edit–>Replace–>[replace ^t with nothing]). This will combine the 3 segments into a single link on each line.
www.domain.com/article1.pdf
www.domain.com/article2.pdf
www.domain.com/article3.pdf
…
www.domain.com/article9999 .pdf
www.domain.com/article 10000 .pdf
Optionally, press CTRL+A and click the bullet icon to add bullets and the list easier to read (not that DTA will care).
Follow the previous instructions to create an HTML file and open it in DTA. Merry Christmas!

Once you get the hang of DTA, you'll wonder how you ever could've wasted so much time, manually clicking on link after link like some unevolved single-celled lifeform.

Hope this helps somebody get organized. (Physician, heal thyself!)

P.S. If anyone is wondering what happened to my blogroll, it's temporarily down while I reorganize it. I was using Bloglines to manage those links, but that fact seemed to be slowing down pageloads ridiculously.

Tips for Internet-based scholarship