Greetings to the group, and to Michael. I’ve been testing out EF for a little bit… and like many I am impressed all around. My library is 2.8GB, containing just over 10,000 records, but so far EF is handling all the data admirably. The RAM footprint is under 100 MB and search is amazingly fast. Having this identical library in DevonThink caused that program to consume 350MB at boot and over 750MB if tried to used any of its AI features (emphasis on ‘try’ because there were many beach balls to be had). So color me impressed… I think I’ll be sticking around
One question has come up during my experimenting with EF, however, and I hope you’ll indulge me in a discussion of it. One of the biggest assets to EF is that it stores everything in a normal folder structure, ensuring that everything can be read long into the future. In much the same vein, for the past several years I have always saved webpages in plain HTML format rather than Safari’s Webarchive because the latter is, so far as I know, a proprietary format only readable by Webkit-enabled applications. As such, I find capturing in plain HTML preferable for a number of reasons:
Data longevity. Despite being an open-source project, it is hard to assume that WebKit will be around forever, or that the latest OS incarnation of WebKit will always be able to read old Webarchives. However, so long as webpages are based on HTML, the OS rendering kit will always have to be able to parse .html files, so I see the latter as a much more secure format.
Web access from PCs. Another concern is that, should I ever want to put my EF directory structure in the cloud for web access, there’s a good chance that I would be accessing it on a PC without Safari.
Size. A typical news article on complex site can be over 600KB in Archive format, but only 70KB in HTML. This is over an 8x difference in size. I know hard drives will always get cheaper and bigger, but I think it’s advantageous to keep the library as lean as possible.
I realize that it would be simple enough to just use Safari’s save dialog to save as Plain HTML into the To Import folder, but I think that the Capture and Capture with Options features (along with the additional metadata they gather) are even more useful. I suppose what I am asking is… is it possible in the future for EF to capture as Plain HTML instead of a WebArchive? If such an option would be prohibitively difficult (or just not supported by the WebKit framework), I understand, but I thought I would suggest it, as a .html + metadata Capture would be ideal for my library.
Once again, much kudos to you Michael for making a data organization app that not just works, but works well.