Web archives & PDF Scale

I’ve saved a bunch of web pages over the years in Webarchive format. Generally, that’s been pretty successful. Some websites, like the New York Times and the Washington Post require signing in again when trying to view these saved pages. (Normally, I’m automatically signed in when visiting their web sites.) But, I understand why EagleFiler can’t save log-in credentials, and it’s no big deal to click a couple buttons to log in again.

BUT… I discovered just today that the few NYT pages that I’ve Webarchived since sometime in October plain won’t open. The page just flashes for less than a second and then I get sent to a blank page. Perhaps it’s a photo of a polar bear sunning herself in a blizzard - dunno.

So, it seems like saving as a Webarchive may not be viable as much any longer, at least for some sites. Exporting as a PDF still seems to work fine.

Anybody have any ideas?

(The PDF option is pretty good, still. For a long time I did exactly that, even though the files are a bit larger than Webarchives. The hitch is that the scale of the displayed PDF in Preview is almost full screen by default, which is annoying. Yeah, I can easily rescale the view down, but it is annoying. I’m looking for a good batch pdf rescaler before resorting to building one as an AppleScript.)

There are ways to import private Web pages such that you get a complete copy stored in EagleFiler and won’t need to log in again.

It’s not clear what might be happening there, but if you imported using the capture key rather than using one of the methods linked above, the article content is probably not saved in the Web archive, so you’ve essentially got a bookmark saved. You could use the Open Source URL command open the live page and then import it properly.

Web archives of private pages should work if you create them using Safari (while logged in) or using the EagleFiler: Import service hotkey.

I’m not quite sure what you mean here. Is there an example that you could share? The width of the PDF should be the width of a page, which is normally smaller than the screen. Both EagleFiler and Preview have ways to control the display scale. You can set it once and have it apply to all documents, so I would not expect this to be annoying.

After much additional experimentation, here is what I found:

Whenever I used the Safari “Save As” function or the equivalent EagleFiler keys, the web page did save as a Webarchive, as expected.

When viewed in the EagleFiler Record Viewer window, even private web pages display properly. As expected.

When I double-click on a record is when the problems show. Safari balks in some cases and asks for sign in information in other cases. This is also true when I choose to Open With Safari.

The behavior with regard to this on at least one web site - the New York Times - has changed in the last month. Webarchives saved from before then ask me to sign in when opened with Safari. Newer Webarchives choke. This is true even when EagleFiler is not open and I directly open the Webarchive file through Finder.

But, when I use Open Source URL, all is well. Just as when I originally read the article I chose to save. No asking for additional sign in credentials or any of that. That is something I just tried within the last hour and found to be successful. So, EagleFiler ends up saving the day.

As for the display of PDF at almost full screen in Preview, that appears to entirely be the way Safari saves a web page as a PDF. Even directly opening a Safari saved PDF in Preview - EagleFiler closed - gives a giant view. That can be fixed with an easy trackpad action, but it seems wrong. No changing of Preview display scale seems to remedy that. I found others on the web having similar problems. It may not be universal, though. If I missed something here, I’m happy to be pointed to as an idiot as long as there is a solution.

The attached full screen shot displays what I mean. To be clear, this is absolutely not an EagleFiler issue in any way. Something is just not set right. The default Preview display of the default Safari pdf is just bigger than life.

I don’t think this is related to EagleFiler but rather to the Web archive file itself (which was both created and viewed using Safari). You may find that the files work better in Safari if you turn off JavaScript.

I think the better fix is to save proper Web archives, which don’t need login/JavaScript, using the EagleFiler: Import service hotkey.

This opens a live page rather than showing the contents of the Web archive file.

It looks to me like you have Preview set to Show as Continuous Scroll. You may prefer to set it to Single Page:

I tried using the EagleFiler: Import service. Same effect.

The Webarchive looks perfect in the EagleFiler Record Viewer. It opens for about a second in Safari when I double click the record in EagleFiler or use any of the Open in Safari commands in EagleFiler. Of course, Open Source URL works, because it goes directly to the original source.

I’ve convinced myself that this is a Safari problem, since with EagleFiler closed, opening the Webarchive file in Safari does the same thing.

I tried disabling every extension I have in Safari - no change.

As a kind of sanity check, I copied the NYT web page that I saved as a Webarchive to a different computer. This one has very few applications on it and no EagleFiler. When I try open it there in Safari, I get the same effect. I have sent you a copy of this file to your email.

Baffling, eh?

As for Preview, yes I had Show as Continuous Scroll. Changing to Single Page had zero effect.

Baffling, eh?

I tried a Safe Boot and the same things happened there, too.

One more thing…

I just tried Skim as a pdf reader. Zero scaling issues when looking at saved pdfs, unlike Preview.

Your Web archive works for me in Safari with JavaScript turned off. I think it works in EagleFiler because EagleFiler knows certain domains to automatically disable JavaScript for.

But the Web archive that you sent me seems to be a full-page Web archive saved from Safari. I still think that if you create a Web archive from a selection using the EagleFiler: Import service it will not have this problem, even in Safari.

It only takes effect when opening a file, and Preview may remember the previous state for previously opened files. You can press Command-2 to change the current document to Single Page mode.

I did use EagleFiler: Import service to save that page. I just tried it again with a different NYT article and got the same result. When the Webarchive was opened with Safari, I got one second of a view then the window went all white.

However, turning off JavaScript does resolve the problem. I wonder what problems leaving JavaScript off will bring, though. Such as being able to post this comment…

In the Record Viewer, that Webarchive does get cut off, for reasons you’ve explained previously because EagleFiler can’t work around the privacy restrictions off the web pages.

Thanks for the information about Preview. I wasn’t cognizant of the Command-2 approach. But, that just gives me a choice of an oversize view (Command-1) or a view of the entire page at once (Command-2). Skim provides a view that is very similar to what Safari shows. Just personal preference - I never even heard of Skim until today. Sometimes third party software works better than the Apple supplied software for some functions. :slightly_smiling_face:

I did some more experimenting.

Full page Webarchiving with the EagleFiler: Import service does preserve the formatting and look of the original web page. But, in order to display private web pages, I do need to turn off JavaScript.

Using Command-A to select the entire web page content with EagleFiler: Import causes the title of the page to be lost (“SKIP TO CONTENT” is the new name) in the pages I tried. The formatting changes some, too. But, viewing in Safari is ok without turning off JavaScript. I may be doing this improperly, though.

Exporting to PDF maintains the original format, minus some of the stuff that’s superfluous, and looks just the same as the original web page when viewed in Skim.

All three approaches maintain the embedded links to other articles.

Sorry to put you through all of this. Some of it is my fault for not really understanding some of the subtleties of the various applications. Some I plain can’t explain.

OK, that’s odd because the contents of the Web archive don’t look to me like it was saved in that way. Do you have better luck if you only select part of the page (e.g. the main article text) before using the import service?

Or you can could turn off the ImportTextAsWebArchive esoteric preference and then the service will import the selected text as RTF instead of a Web archive, which should avoid the JavaScript issues completely.

Yes, it’s probably not tenable for general Safari use.

You can also resize the window in Continuous Scroll mode to pick whichever size you want.

Yeah, I like Skim, too.

Well, this is what I was suggesting originally, but you said it didn’t work. Maybe you were using the service to import the URL before rather than selecting the page content?

The title will be based on the first line of text selected. You can of course edit it later. Or I like to select the text starting with the title and that way I don’t get the navigation chrome, just the content.

I tried selecting text (Command-A) as well as selecting the URL from the top of Safari. Both had their pluses and minuses.

I want to emphasize that this was only an issue with the New York Times and The Washington Post web sites. Every other web site seemed to work as expected, using either approach.

So, I finally decided to just Export as PDF any web pages from those sites that I wish to save. Since Skim can automatically resize any PDF, that gets around the main objection I had with saving to PDF. Yeah, I’m being picky. I can’t think of any downside to using PDF for this. Besides, you never know when Apple might genuinely deprecate their Webarchive functions. PDF will probably be around for a while.

I may give RTF a try, just to see.

Thank you again.

Clarke