- I scraped nearly 1T of html-ish data from the Internet Archive version of a web site that was recently taken offline (using this fascinating tool), and the data is noisy and trashy. I intend to normalize it, clean it, de-dupe it, and transfer it into an industrial-strength database. But in the meanwhile, I’m thinking of parking it in EagleFiler, so I can grab urgent data here and there as needs arise.
I don’t have nearly enough space on my HD, so I’m thinking of storing my EF library on an external drive. But I have a distant memory of doing that once before and running into poor results, and eventually restoring it to my HD. Is that a false-memory? Is EF perfectly ok running from external?
- I just noticed the pref choice of web page format, which seems (?) to default to web archive. Should I leave this alone? In most contexts, I’d prefer PDF, but I’d imagine for EF’s purposes, archive or html might work best.