C-Command Software Forum

Searchable PDFs

Question from a trial user:

I have many PDFs of books I do in my research. Is there a way to get them into EF in a searchable way?

Thanks

If the PDFs contain text, EagleFiler can search them. If the PDFs only contain images, you would need to run them through OCR software, which adds a text layer that EagleFiler and other PDF software can read.

ABBYY FineReader and Acrobat can OCR PDFs into Searchable PDFs (text overlay)

Another option is PDFpen.

Ocr
I am always confused by the topic of OCR. My Brother printer/scanner appears to come with built-in OCR software. By using the “Brother Control Center” I can select an OCR option and get searchable text.

At the same time, doesn’t Image Capture, which comes on the iMac, provide OCR capability? I think I’ve used that as an alternative to the Brother stuff.

What is gained by using the other programs mentioned in this thread?

Are there plans to integrate an OCR engine in EF?

If not, could someone share a script that automatically does OCR on PDFs using PDFPen on import?

Thanks!

Image Capture does not have built-in OCR, but if you have other OCR software you can probably setup Image Capture to invoke it after scanning.

That’s something I’m considering. Sorry, that’s all I want to say for now.

That sounds like a great idea for a script! I’ll see what I can do.

I’ve just written a script to do this, but there seem to be two bugs in PDFpen that prevent it from working. I’ve reported them to SmileOnMyMac, and I’ll update this thread when we have a resolution.

Great news, thanks!

I guess that no update to this thread means there has been no news on this front?

Correct. The last PDFpen update was on December 16, just one day after I reported this bug. Hopefully they’ll get to it in the next update.

The PDFpen developer sent me a workaround for the problem, so I’ve posted the OCR With PDFpen script.

Great news, thanks!

I have a tiny suggestion: maybe the script could add the tag “ocred” or something similar to help finding pdfs which have not yet been ocred.

Another solution would be to use the ocr to EagleFiler script that michael has. Make a target folder, and add that script to the folder as a folder action, have you scanner deposit the pdfs it makes to that folder, and the rest just happens. I know it works because I’ve been using it to scan my receipts into EagleFiler. My Backlog went away in short order with that script which is here.

Thank you, thank you, thank you for the the OCR With PDFpen script! This was really the last feature that made me vacillate between EagleFiler and DevonThink. Now, for me, EagleFiler is a clear winner!

There is a 20% discount on PDFpen and PDFpen Pro available until 02/28/10:

http://www.smileonmymac.com/mpu/