How to get info from Journal pdf

Josef · February 1, 2010, 1:29pm

I try to import IEEE Journal files (pdf) like ‘yyyynnpppp.pdf’ (year, issue number, page).

EF’s Info Inspector shows their info like
[title] : title of Journal
[from] : authors of Journal
for ONE group of files, however, shows like
[title] : yyyynnpp (file name itself)
[from] : empty
for another group of files.

I guess this difference comes from different pdf format somehow. And so, if I know about what kind of way EF extracts such info from pdf file, I can import them (9700 files : 7 GB) with correct info via appropriate Script.

Please suggest any information or hint about this.

Michael_Tsai · February 1, 2010, 4:44pm

EagleFiler extracts the title from the PDF’s title field (which you can view using Get Info in the Finder, for example).

Josef · February 1, 2010, 6:10pm

Thank you for kind reply as always.

Your information has made 1st step forward to get the same results as Mendeley.

I will try to make script according to your previous suggestion in this forum.

Regards,

Michael_Tsai · February 2, 2010, 6:36am

Also, if the PDF’s title field is empty EagleFiler will use the filename.

Josef · February 2, 2010, 3:07pm

Thank you for additional advice.

Is there any way to get info from pdf files that have standard expression (title, authors, abstract, …) with file name as yyyynnpppp.pdf.

Practically, I have to make script for that purpose?

Michael_Tsai · February 2, 2010, 4:32pm

Well, EagleFiler will extract the title and author automatically and make them available in the user interface as well as via AppleScript. If you want to directly get at the metadata in the PDF file, you could perhaps write a script that uses PDFpen or mdls.

Josef · February 8, 2010, 6:22am

Using mdls command, I can extract the title and author automatically from kMDItemTitle and kMDItemAuthors even in the case of yyyynnpppp.pdf.

Thank you for your kind advice, that is truly helpful for beginners of EF and OSX like me.