Often times, I end up saving a web page in multiple formats… anyone have suggestions on their workflow for easily clipping content? What I want:
-
It needs to be as easy as a bookmark - I don’t want to “think” about which format. So, picking a format, or saving a format each time, isn’t working out. Yet I am afraid to commit to one format. I don’t want to go a year and then realize, shucks, I shoulda stuck with PDF, or shoulda went webarchive all the time.
-
I would like to preserve the structure - i.e., text import with images is usually messy in other apps when I’ve tried it. Some Javascript ends up showing as text, all layout is lost, etc.
-
I want it to be fully-text searchable - this means image screenshots are out, since they’d then need to be run OCR. Tried DevonThink for this reason, but it was slow to OCR, and not close to 100% accurate even on screenshots of simple fonts!
-
I want to be sure it works without the Internet connected, and without the original site available.
-
If it could clean out any banner ads, great, haha.
WebArchive seemed to be the obvious choice - I was okay with the lack of compatibility beyond Safari. FIgured I could also do a batch convert to PDF on some or all of the archives later. However, I noticed sometimes they are slow to load! I thought that WebArchive would pull all images in things like image slideshows that use JavaScript, but now I think that is not the case.
So, if it won’t truly pull everything that could appear on a modern web page, I’d rather just use PDF capture - seems cleaner, lighter, more portable. I was surprised that the PDB kit that EagleFiler uses seems to get a very nice layout, not sure if this is limited to sites that have a stylesheet for printing though? I am guessing that is the case.
I also have a nice PDF capture tool in Safari, forget who makes it, that scrolls the whole page WITHOUT reloading - that gets a very accurate snapshot of what you’re seeing, of course. But then it’s not searchable.
Also, if sticking with PDF - continuous, or split? Do I give up the benefit of cleaner page breaks if I want split later, using another tool (or even re-capturing within EagleFlier from the continuous version’s record)? I figure continuous will look more like a web page, less space wasted when browsing on any computing device, etc.; just not as good for reading via e-reader apps, I believe (I think then one wants pages sized to fit the aspect ration of the e-reader, right?). This use case is probably going to be very rare though (reading on an e-ink screen with slow refresh rate), so I guess I don’t mind having to make a conscious decision on a different capture for content like that, or converting it afterwards, as and when needed.
Thanks for any suggestions! I’ve also tried Pocket, Instapaper, and Raindrop.io. RainDrop had me excited as it does offline saving on their servers, but has a Dropbox sync option - but I found out it is only syncing an HTML bookmark file of all bookmarks. Also, just saving a bookmark takes a few to several seconds, which is very annoying. I realize saving a WebArchive can take time too, but, I think I can navigate away from the page, even, and it will still complete. I don’t need to wait while it captures in EagleFiler, I mean. Pocket and Instapaper are okay - InstaPaper is just too simple, only good for things you intend to read for a while, I’d say, not for general bookmark stuff. Pocket seems good, but app is barebones on Mac (can’t even put left column in Dark Mode, only the rest). I might keep using Pocket, but it doesn’t feel like a bookmark tool either. It is quick though. But can’t capture logged in sites either, since it captures from their server, from what I understand. And no way to store offline content on your computer, only on their server, so again, no ability to browse saved copies on your computer with no Internet (haven’t actually tested that, but I am pretty sure the Pocket app isn’t caching everything in your account locally)