PDF+Text

Jack_Nolan · May 25, 2009, 7:13am

I know this topic has come up before but I can’t find if it was ever resolved. Because I have a large number of PDF’s and because OCR takes so much time, I frequently scan documents in to EF and defer the OCR until a later date or time.

So, I’m looking for a way to quickly identify PDF files in EagleFiler that need to be OCR’d. DTPro does a great job at this and OCR (the only reason why I still use both on my Mac).

The ideal way would be if EF differentiated between PDF and PDF+Text in the KIND field like DTPro does. I could then create a smart folder to quickly identify all of my un OCR’d PDFs.

A workaround might be a script?

Thanks in advance for the help/suggestions.

Jack

Slade_Mahoney · May 25, 2009, 8:38am

This isn’t exactly what you’re asking, but depending on your situation, an alternate approach might work.

Have you filed those scans already? If not, they should all be in the “Unfiled” Smart Group. Have you tagged them already? If not, they should be in the “Untagged” Smart Group.

For future ones, perhaps you could put them in a special folder, or give them a special tag when you do the scan, and change it when you run the OCR.

Michael_Tsai · May 25, 2009, 9:11am

Yes, if the contents property of a PDF record is empty, it probably needs to be OCR’d. But I would instead approach this problem from a workflow point of view, using a folder or tag to keep track of which PDFs need processing.

Jack_Nolan · May 25, 2009, 12:55pm

Contents?
I must be slow today. I’m having trouble finding the contents property.

I went to set up a smart folder with “contents = nothing” and could not find the contents field.
Tried making it visible in a column but there were no columns named “contents”
Could not find contents under the search area.
Opened up the Inspector and could not find the contents field there.
Found “Show Contents” on the VIEW menu but it was grayed out for all of my PDFs (both PDF and PDF+Text files).

Out of desperation, I even looked in the manual! What am I doing wrong?

Thanks,

jack

Michael_Tsai · May 25, 2009, 1:21pm

There is no “contents” in the user interface. You said “A workaround might be a script?” and the answer is yes: you can write an AppleScript that looks at the contents property.