OCR script with ABBYY FindReader

shipahoy · October 3, 2011, 4:07pm

Hello,

Just received a new ScanSnap scanner and would prefer to use my EagleFiler application vs EverNote. I was wondering if anybody out there has an applescript code written for ABBYY Fine Reader to import automatically into an EagleFiler database?

I read the FAQ and seen the PDFpen code.

I’m really a newbie when it comes to Apple Script codes. Any help would be greatly appreciated.

Thanks in advance

shipahoy · October 5, 2011, 8:57am

Hi there. Still no response so I’m thinking nobody has written a script at this point.

So here is what I have accomplished so far and what I am trying to do.
I want to make scanned PDF’s from my S1300 searchable in EagleFiler.
In other words, add that text layer to the PDF so that I can search in the scanned text. I’m using the supplied ABBYY Fine Reader. However when I scan a document into the import folder, the text is not searchable, it doesn’t work. Here are my settings:

In the SnapScan Manager “Settings” field, I have “Scan to Searchable PDF” selected as the application. On the “Save” tab I have Documents/EagleFiler Library/To Import (xxx)" selected. My thought is that EagleFiler will automatically import these scanned documents when its restarted.

Is there something I’m doing incorrect? Any help would be greatly appreciated!

Michael_Tsai · October 5, 2011, 9:15am

EagleFiler checks the “To Import” folder often, not just when it’s restarted. I suppose there are two possibilities:

EagleFiler is importing the file after it’s saved in that folder but before ABBYY has added the text layer.
ABBYY is not adding the text layer.

So why don’t you try this when the library isn’t open in EagleFiler, look at the PDFs, and then you’ll be able to see whether ABBYY is doing its job.

shipahoy · October 5, 2011, 10:00am

Thanks for a response. Much appreciated!

When I selected a bit of text to search, I had Search within FILENAME selected instead of ANYWHERE

Michael I’m sorry about this. It was my mistake.

As for your question re: the differences between FineReader Express, I’m not sure.
The ScanSnap S1300 ships with FineReader V4.1 now. Its character recognition is truly amazing. Perhaps as people continue to buy the ScanSnaps, it might make it worth your while to write an AppleScript for this. HINT HINT

If there is anything I can do to assist you, I would be more than willing.

Thanks again for your help.

Michael_Tsai · October 5, 2011, 10:11am

If the search is already working (indicating that the PDF contains a text layer), what is it that you need an AppleScript to do?

shipahoy · October 5, 2011, 10:15am

I guess my understanding is that the applescript would automatically add the files to the library whereas right now, the “scan to folder” add it to the folder to be imported. It then gets added either when EagleFiler is started or when it arbitrary scans the “To Be Imported” folder?

No real difference I guess correct?

Michael_Tsai · October 5, 2011, 11:08am

Yeah, not much difference for this particular workflow. Actually, it’s probably better to target a folder inside the “Files” folder rather than the “To Import” folder. This way you have control over when it gets imported, so you avoid the possibility of EagleFiler trying to read the file while ABBYY is working on it.

When I have a chance I’ll take a look at FineReader and see if there’s a way to script it to do stuff like this.

ThisIsNotMelTorme · June 13, 2012, 6:50am

A script to use ABBYY FineReader to OCR PDFs from your ScanSnap
In case it helps, I developed a little bash script to automate OCRing a PDF PRIOR to placing it into EagleFiler. I’m sure you can modify it to suit your needs AFTER the file has been imported into EagleFiler. Sadly, ABBYY FineReader (I’m running version 4.1) is not directly scriptable (it does not support AppleScript).


#! /bin/bash

# First test to see if the document has already been OCR'd
if ! grep Font "$1"

then

	# Open ABBYY and start to OCR the PDF
	open -a 'Scan to Searchable PDF.app' "$1"
	
	# Wait for the completed file to appear in the Finder
	while  ! -e "${1%.pdf} processed by FineReader.pdf" ]; do
		sleep 5
	done
	sleep 5
	
	# Remove " processed by FineReader" from the file's name
	mv -f "${1%.pdf} processed by FineReader.pdf" "$1"
	
	# Tell Finder to hide ABBYY FineReader using Applescript
	osascript -e 'tell application "System Events" to set visible of process "Scan to Searchable PDF" to false'

fi

If the PDF did not originate from your ScanSnap, ABBYY FineReader will not be able to OCR it; the “Creator” of the PDF needs to be “ScanSnap Manager”. You’ll need to perform an additional trick upfront if you’d like to OCR PDFs that did not originate from your ScanSnap. Let me know if you’d like to know how that’s done.

tonycolorado · November 9, 2020, 3:28am

Found this old thread and managed to get this working within Eaglefiler for Abby Finereader using a combination of the EagleFiler instructions for PDFPen and the Hazel Instructions for Finereader.

This script does the trick for me, add to your Scripts Folder and execute from within Eagle Filer.

on run
	tell application "EagleFiler"
		set _records to selected records of browser window 1
		repeat with _record in _records
			set _file to _record's file
			my ocr(_file)
			tell _record to update checksum
			my removeTag(_record, "NeedsOCR")
		end repeat
	end tell
end run

on open _files
	my ocrAndImport(_files)
end open

on adding folder items to _folder after receiving _files
	my ocrAndImport(_files)
end adding folder items to

on ocrAndImport(_files)
	repeat with _file in _files
		my ocr(_file)
	end repeat
	tell application "EagleFiler"
		import files _files
	end tell
end ocrAndImport

on ocr(_file)
	using terms from application "FineReader"
		set langList to {German, English}
		set saveType to single file
		set exportmodepdflayout to "text over the page image"
		set keepPageNumberHeadersAndFootersBoolean to yes
		set keepImageBoolean to yes
		set imageOptionsImageQualityEnum to balanced quality
		set usemrcboolean to no
		set makepdfaboolean to yes
		set pageSizePageSizeEnum to automatic
		set increasePaperSizeToFitContentBoolean to yes
	end using terms from
	tell application "FineReader"
		#		open _file as alias
		#	tell document 1
		export to pdf _file from file _file
		#		end tell
		tell application "FineReader"
			quit
		end tell
		WaitWhileBusy()
		
	end tell
end ocr

on removeTag(_record, _tagName)
	tell application "EagleFiler"
		set _tags to _record's assigned tags
		set _newTags to {}
		repeat with _tag in _tags
			if _tag's name is not _tagName then
				copy _tag to end of _newTags
			end if
		end repeat
		set _record's assigned tags to _newTags
	end tell
end removeTag