C-Command Software Forum

Bug: importing local web archives

EF 1.1: Web archives imported from the Finder are not indexed by EF (not even their titles). Also, the contents do not appear in the Record Viewer.

Works fine for me. Is it failing with every web archive you’re importing?

Thanks for your reply, sjk. Now that you mention it, some web archives do import properly. Here is an example of one which does not:

tinyapps.org/1.zip

(Sorry it is not a hyperlink - I’ve got Apache set to disallow direct linking of files to cut down on bandwidth. Please copy and paste into address bar).

Seems something’s unusual with your sample webarchive file that’s causing EagleFiler not to properly recognize it. I recreated the symptoms you reported with EF and similarly with DEVONthink Pro, although DTP was able to open it in a separate window. No errors seen in app or system logs.

I located what appears to be the original document at this URL:

http://www.informit.com/articles/printerfriendly.asp?p=605499&rl=1

EF (and DTP) displayed it properly after capturing directly from Safari (Version 2.0.4 (419.3) on 10.4.8) and when imported from a saved webarchive file. They appear identical by visual comparison in Safari, however …

Your file has 23602 characters:

-rw-r–r-- 1 sjk sjk 23602 Dec 15 12:20 How to Configure OpenVPN.webarchive

(btw, I removed a trailing newline character from its filename)

The one I saved has 25256:

-rw-r–r-- 1 sjk sjk 25256 Dec 15 20:27 OpenVPN.webarchive

I don’t have a place to conveniently upload mine but hopefully that info is a helpful start for further debugging.

Thanks for your reply and efforts, sjk!

Seems something’s unusual with your sample webarchive file that’s causing EagleFiler not to properly recognize it. I recreated the symptoms you reported with EF and similarly with DEVONthink Pro, although DTP was able to open it in a separate window.

EF will open it in Safari if I double click, but will not index or display the web archive inline. Since DTP seems to have trouble with it as well, I guess it’s not an EF issue.

(btw, I removed a trailing newline character from its filename)

Sorry about that - I noticed it, too. Removing it did not seem to help, though.

I don’t have a place to conveniently upload mine but hopefully that info is a helpful start for further debugging.

It is a huge help - thank you.

Gratefully,

Miles

This Web archive seems to link to local resources:

file:///Users/user/Desktop/desktop%20items/move%20to%20os%20x/openvpn_howto.htm

rather than the ones at informit.com. How was it created?

At first glance, it looks to me like there’s a bug in WebKit that causes it to stall and not load the page if one of the resources is unavailable. I will try to find a workaround.

This Web archive seems to link to local resources rather than the ones at informit.com. How was it created?

IIRC, this file was saved in Firefox on another machine, transferred to the new machine, loaded in Safari, saved as a web archive via Safari’s “Save as” dialog, and finally dragged into EF. The last two steps were performed since the page would not import into EF from Safari (or Firefox) via drag and drop.

Here is another web archive that behaves the same way: tinyapps.org/2.zip

I believe it was imported using the same convulted process described above. Since this process is far from common, and not really EF’s problem anyway, I certainly understand if you just want to let it go.

Aha, so it’s a Web archive created from local files rather than from the actual Web site? Yeah that’s uncommon, although I’d still like to properly support this type of Web archive.

The only time I really need to be able to import web archives from the Finder is when using MyPage to get rid of ads and other useless junk before importing into EF.

It’s interesting that DEVONthink won’t display it in a WebKit pane (e.g. in Vertical Split View) but renders it fine opened in a separate window.

Yep, that could be useful for trimming fat from pages before saving as RTF(D) and Webarchive documents. Thanks, TA.O! Glad I wandering into this thread. :slight_smile:

The only time I really need to be able to import web archives from the Finder is when using MyPage to get rid of ads and other useless junk before importing into EF.

Yep, that could be useful for trimming fat from pages before saving as RTF(D) and Webarchive documents. Thanks, TA.O!

My pleasure! Glad I could return the favor! Thanks again for your help and time.

This looks really nice. However looking at the description I can’t find out how to export the page when I’m done editing. I’m sure I’m missing something obvious here.

brab - After you’ve finished editing the page, press Esc (or Fn Esc on some Macs), then File > Save as > Web archive. Then import that web archive into EF. The only problem at the moment is that EF will not display or index the page, though it will launch from EF with a double click on the title.

I see, thanks for the explanation.

EagleFiler 1.1.3 includes various workarounds for Web Kit bugs that caused parts of certain Web archives to not be displayed or indexed.

Awesome!
Thank you, Michael! I have tested importing several locally saved Web Archives, and they are now indexed properly. Many thanks!!