webarchives> media/authentication

jon.5j5 · October 30, 2006, 8:23pm

Hi Michael,

I found the c-command site via an update about SpamSieve 2.5 at MacNN.com, they have a nice news feed that I subscribe to. EagleFiler seemed interesting as well so I downloaded them both.

I have two questions from my initial go around with EagleFiler:

Are the media from the page included inside the webarchive and consequently always available? Could I email that single file to another Mac user for viewing?
-> My impression is that Safari’s default webarchive format only saves the html, so images (and other media) are lost when that information is removed from the server.
I keep a daily journal from a website that requires authentication. It appears that when I use EagleFiler’s capture key-board shortcut, I get the login page instead. My guess is that your program re-queries the page and must do that outside of Safari. Did I get that right, and if so, do you have any plans to fix the authentication issue?

I like what you’ve done to organize EagleFiler and make it work, but my personal preference is to leave that information in it’s native format/location/application. It does seem to me that you’ve improved Safari’s webarchive. Personally, I switched to iCab which archives both the html and images into a .zip format. (For what it’s worth; both of these formats seem limited to their creator applications for viewing. I guess I could always automate a print to pdf routine for portability…)

Thanks,

-jon

Michael_Tsai · October 31, 2006, 6:55am

Hi Jon,

The idea of Web archives is that everything to do with the page is stored in the archive, so that you can e-mail it or read it without an Internet connection. This includes HTML files (more than one if there are frames), images, stylesheets, JavaScripts, etc. You can open a Web archive file in Property List Editor to see the stuff that’s being stored.

Without more information, I can’t be sure about the specifics of that page. In general, EagleFiler will use any stored login information or cookies to load the page (yes, it re-downloads it). There are some sites for which this won’t work (and probably can’t work). In that case, you would need to save the Web archive from Safari and import it into EagleFiler, or use the “Save PDF to EagleFiler” command in Safari’s Print dialog.

Well, the idea behind EagleFiler is to store the information in its native format. It uses standard Safari Web archives, which I think are the closest thing we have to a standard in this area (they’re readable by any app that uses WebKit, and WebKit is open source). I used to save pages with iCab in Zip format, and it’s true you can get the pieces out, but only iCab knows how to draw the whole page from the Zip. But I agree with you that PDF is the best if you want maximum portability, and that’s why there’s the “Save PDF to EagleFiler” command.

jon.5j5 · October 31, 2006, 9:17am

Thanks
Hello Michael,

 I knew as a developer that you could shed some light on the subject.   I didn't realize that we can open web archives with the property list editor, but it makes sense.  

 With all of the open source initatives, is there really a standard for 'web archives'?  I just tried this with Firefox and it creates an archive with a seperate folder to store the additional resources.  I like the idea of a single 'file' instead.  Microsoft's Internet Explorer uses a single file that is, to the best of my understanding, basically a mime message which seems like a good idea as well.  I suppose that eventually the browser community will move to some type of xml format which probably brings us back to some kind of property list [conversion](http://developer.apple.com/documentation/Darwin/Reference/ManPages/man5/plist.5.html).  In looking at the man page, it would appear that Darwin had this in mind from the start.

 Thanks for helping me think about this.

Best Regards,

-jon

webarchiver · May 5, 2007, 9:42am

More on authentication

I can’t seem to get a web archive to work with ANY site that requires authentication – trying currently on gmail, facebook, nytimes, etc. I’ve tried in Safari and Firefox, making sure to be logged in on both within the browser, with similar results. I’m using the capture key from the browser itself. Any other tips on how to get this to work? Any chance there are Safari settings I need to change?

Thanks very much.

Michael_Tsai · May 5, 2007, 11:05am

I don’t think this will work regardless of your Safari settings. You’ll have the same problem with any other application that uses Web Kit, e.g. NetNewsWire. In the case of Facebook (I haven’t tried the other two today), I think the login only lasts for the current session—even quitting and re-launching Safari will log you out—so there’s no way to share the login with a different application. For sites such as these, it’s best to save the page as a Web archive from Safari and then import that file into EagleFiler. (Or “Save PDF to EagleFiler” if you’re using Firefox, which doesn’t support Web archives.)

jon.5j5 · May 5, 2007, 11:20am

I just use the print to pdf from the print dialog and name the page manually.
I don’t really know all of the ins/outs of this issue, but I too find it frustrating. I’ve moved on and just us the print to pdf function manually and later search the contents my archives with spotlight.

An additional issue I have is that the pdf is broken up into seperate pages. I know there’s a utility for web designers that fixes this problem, it captures the whole page as displayed in the browswer without breaking it up in a pdf, but I can’t remember the name.

-jon

Michael_Tsai · May 5, 2007, 1:20pm

This is one of the reasons that I recommend using Web archives. I think in general it’s best to store documents in as close to the original format as possible. This provides more flexibility going forward. Thus, EagleFiler directly reads Web archives, mailboxes, Word documents, etc.