Multiple ways to import: what's best?

Rbohn · January 11, 2007, 12:47pm

I am a new user, and still trying to figure out best ways to get data into EF. For example, I often import web pages and want to omit the ads and extraneous material. There seem to be at least 6 ways to import web pages. What are the differences, and which is best for what purpose? I’m hoping someone has or will put together a table.

Hit the capture key (F1) when the web page is active. This imports as a Web Archive, but it’s not too smart.
Save the web page using Save As
in Safari; then drag the file onto the the drop pad. This allows you to import the page source, and skip extraneous material in web archive format. But, you end up with 2 copies of the file to deal with.
Highlight the URL. Then in the menu, use Safari/Services/Eaglefiler/Import URL
. Not sure what format this produces.
Highlight the exact text you want. Then use menu command: Safari/Services/Eaglefiler/Import Text
.
Highlight the exact text. Then drag and drop it onto the drop pad. This imports pictures, but not HTML such as hot links.
Use Safari/Print, then Print to Eaglefiler
. This brings it in as a PDF.
So far, the SMARTEST method seems to be #5. It is quite intelligent about assigning the Title and “From” codes. It remembers the URL even without explicitly being told, and automatically sticks it at the bottom of the document - slick. And most important, I can quickly edit the text in the document window, eg changing the color of important text and getting rid of ads. (Oddly enough, method 5 is smarter than method 4.)

Michael_Tsai · January 11, 2007, 1:29pm

It really depends on what you’re trying to do.

#1 and #3 are two ways of doing the same thing. Both will import the entire page as a Web archive and store the page’s title, From, and source URL. I’m not sure why you say this is “not too smart.” Web archives are not editable, although you can use the Convert For Editing command to make an RTFD copy of the Web archive that is.

#2 is almost the same as #1 if you save in Web Archive format. (The difference is that this saves the page as it is displayed in Safari, whereas #1 and #3 download a fresh copy.) If you save as Page Source, then this will be somewhat different as you won’t get any of the images, stylesheets, or frames, just the main HTML file. I don’t understand why you say that you end up with “2 copies of the file to deal with.”

#5 is better than #4 in that you’ll get more metadata. This is because the OS provides EagleFiler with more data when you use drag and drop vs. using the system service. Also, contrary to what you say, #5 (and #4) should preserve most of the HTML formatting and the links.

#6 gets you a PDF. Unfortunately, due to the way printing works, EagleFiler won’t know the URL for the PDF (thought it is stored in-band in the footer).

If you know you’re going to want to edit the page, and especially if you only want to save part of it, then #5 is probably best. I tend to use #1 myself.

brab · January 12, 2007, 1:30am

I did not know about #5, it’s good to know.

For temporary storage (i.e. stuff I just want to read later), I use #1. If I want to preserve some information for later, I then convert for editing (which works wonderfully) and only keep the relevant information. I’d suggest you take a look at this option, it’s quite powerful.

Rbohn · January 13, 2007, 1:29pm

I discovered method #5 by accident, basically. I was not aware of “convert for editing,” thanks for suggesting it. I will have to experiment with the file formats to see exactly what it does, but it seems quite useful. I guess it becomes method #7.

More generally, I’d like to see similar information for importing from other sources, such as PDFs, Word files, Excel files, mail messages, embedded graphics on web pages, and so forth. The import facilities have obviously received a lot of careful development, and there should be something better than trial and error for learning to take full advantage of them.

Michael_Tsai · January 13, 2007, 5:03pm

“Save PDF to EagleFiler” is very predictable; it will always import a PDF of what you “printed.”

For PDF and Word files you’ll get all the metadata (title, author) and images if you import the whole file (via drag and drop or the capture key). If you import a selection of text (using drag and drop or the Import Text service) you’ll just get what was selected, with no metadata, saved as RTFD.

For Excel files (and, generally, any type that EagleFiler can import but doesn’t know how to read), you won’t get the title or author. You can either import the whole file (using drag and drop or the capture key) in which case the file will be copied into the library, or you can import some text (drag and drop or the Import Text service) in which case EagleFiler will create a new RTFD file in the library.

For mail messages, in order for EagleFiler to treat them as mail you’ll need to use the capture key or drag and drop the messages or mailboxes. You can also drag and drop selections of text, but in that case you’ll end up with a new RTFD file in the library.

Dragging an embedded graphic from a Web page is just like dragging a graphic file from the Finder.

Michael_Tsai · January 14, 2007, 10:12am

One more point of clarification: there is a difference between dragging text from Safari onto the Dock icon vs. dragging it onto the Drop Pad, source list, or records list. If you drop it onto the Dock icon, EagleFiler will not be able to determine the URL of the page that the text came from.

Michael_Tsai · October 28, 2008, 11:27am

With EagleFiler 1.4.1, the URL and Web formatting are preserved when using the Import service from a Web view and when dragging text onto EagleFiler’s Dock icon.

towb · February 18, 2009, 8:26am

Does the result of importing a web page as RTFD differ in any way from importing as webarchive and using “convert for editing”?

Michael_Tsai · February 18, 2009, 8:33am

The RTF “Document Properties” (editable in TextEdit) will be slightly different. The content should be the same.

towb · March 6, 2009, 8:08am

Sometimes only “convert for editing” will have the ALT text instead of images, reproducable with http://www.projectrho.com/rocket/rocket3ao.html

Michael_Tsai · March 6, 2009, 8:56am

I’m not seeing that behavior here. Are you saying that Convert For Editing produces different results than importing the URL as RTFD? Which version of Safari are you using?

towb · March 6, 2009, 11:27am

Yes, it removes images.

ImageShack - Best place for all of your image hosting and image sharing needs](ImageShack - Best place for all of your image hosting and image sharing needs)

Which version of Safari are you using?

Of course, I forgot. The Safari 4 beta. But I think I encountered this once before with 3.

Michael_Tsai · March 6, 2009, 12:22pm

I’m seeing this consistently with Safari 4 beta and never with Safari 3. I’ll try to find a workaround, but for now I think your best bet is to revert to the stable version of Safari.

Michael_Tsai · March 24, 2009, 12:09pm

EagleFiler 1.4.5 works around this bug in Safari 4 (beta).