Indexing Hangs and Hangs

I’m trying out EagleFiler to see if it fits my needs and so far it seems a lot better than anything else I’ve tried for organizing/cataloging my PDFs.

I’m having trouble with it just hanging when indexing the library, though. Last night it hung after around 2,900 records. This morning it’s hung on record 722, twice in a row. After the first time I followed a suggestion in another thread for forcing EF to rebuild the index from scratch, but it hit and stopped at 722 again.

Looking at top, I no longer see EF or any associated processes taking up much processor time.

Looking at Console, the books eftexttool was supposedly working on were different each time. After each cancel of the indexing process, I see a different list of:

2008-12-04 10:31:43,288 - tools.pyc:227 ToolCancelledError: Tool cancelled: u’eftexttool’

I say “supposedly” because the eftexttool wasn’t doing anything when I cancelled, but it was earlier than the number of indexed books was rising.

Is it really supposed to restart indexing from the beginning every single time? I’ve added stuff to EF in batches and every time allowed it to index what I added before moving on to the new batch. Now I see, though, that when it indexes it tries to index the entire library from the beginning, even though it should have had most of it indexed already. I can’t have it re-index the entire library every time I add 1, 10, or even 1000 new items.

That’s an interesting coincidence given that EagleFiler indexes files in a random order.

Do you see the same eftexttool or efindextool process continuing to run during the hang? If so, how about sampling it using Activity Monitor to see what it’s doing?

How do you know that it wasn’t doing anything?

Since the contents of EagleFiler’s library are accessible to other applications, it’s possible for them to be modified without EagleFiler’s knowledge. Therefore, when you open a library EagleFiler scans all the files to make sure the index is up to date. It doesn’t re-index, though: if a file has not been modified since it was added to the index, EagleFiler skips it.

When looking at top, which I have running on my desktop via GeekTool, eftexttool and efindextool didn’t appear among the active processes. If ever they didn’t appear without my noticing, it would only have been for 1 or 2 percent of the CPU. When they are active, they are working at 40% or higher and I can hear my fans running faster.

So at best, their activity has slowed to a trickle, but I don’t think they run at all after a certain point.

I’ll close and open the library again to see where it stops and verify again that no associated processes are active after a certain point.

I am, by the way, running 10.5.5 on a new 2.8Ghz Intel Core Duo MacBook Pro with 4GB of memory.

It shows 'indexing X of Y," where Y is the full size of the entire library. So, it seems like it’s going through the entire library again.

If the cancellations show up in the log then there must have been active processes, which could be sampled.

Can you think of another verb that would make it more clear what it’s actually doing?

EF itself is still active. I have an open library window and an Activity Window with a stalled progress bar. I can do things with the library, like move files around and perform searches. However, eftexttool (4 instances) shows as “Not Responding” and efindextool doesn’t show up at all.

I can give you something from sampling EF itself, if you want.

Early on, I was seeing:
ATSServer 40-80%
eftextool (1-4 instances) 5-40%
efindextool 10-70%
EF 1%

Now, it’s stalled at record 424 of 9351. The time remaining keeps increasing, so EF is still monitoring the (lack of) progress.

OK, so I’m sitting here, proof reading the post, and I get the bright idea to force-quit each of the four instances of eftexttool. It works, efindextool reappears, runs at around 80%, then drops and one instance of eftexttool appears, runs at 40%, then another appears briefly, they drop, ATSServer appears and runs at 70%, and then I get back to numbers like those above. And the number of indexed records is going up again.

I don’t know if what I did was truly helpful will corrupt the database, or what. I’m just trying to give you info about what I see in case it helps.

I’m seeing in syslog (also broadcast to my desktop): eftexttool : ATSFontFindFromContainer failed

Multiple times. I don’t know if that is a common error that can be ignored or a problem that builds up until it causes eftexttool to crash. After about five minutes, 2 instances of eftexttool are “Not Responding” and efindextool varies from 5-80%. Occasionally one instance of eftexttool appears for a couple of percent then disappears.

No.

I thought it must be trying to index all 9351 records from the beginning every time because that’s what the Activity Window says it’s doing. You said that it doesn’t re-index. I’ll accept that because you know more about the program than I do, but I can’t think of any other verbs to describe what it’s doing when it says it’s “Indexing records.”

What do you think it’s actually doing? Could it be related to why eftexttool keeps crashing?

When I suggested that you sample using Activity Monitor, I meant that you should use the “Sample Process” command on a hung eftexttool.

Force-quitting eftexttool is safe. Force-quitting efindextool may corrupt the index (but not the database).

That could indicate that you’re missing a font that’s used in the PDF file, or that the PDF file’s internal structure is invalid. Or perhaps that your font cache is corrupted. In any case, it’s not a common or expected error, and it’s being quietly reported in the OS, well below EagleFiler.

Before you said it was hanging. Is it crashing (i.e. the crash reporter windows comes up), too?

EagleFiler works on 4 files at once. It seems that certain of your PDFs are causing eftexttool to hang. Once all 4 worker are hung, EagleFiler will stop making progress.

I know what it’s doing. It is making sure that the index is up to date. It looks at each file. If the file is in the index and has not been modified since it was indexed, EagleFiler moves on to the next file. If the file is not in the index or has been modified, it reads the file’s contents (using eftexttool) and updates the index.

Reindexing would imply that it’s reading every file and putting it in the index; that would be needlessly slow, so it doesn’t do that.

I wasn’t asking what you thought “Indexing” meant. I thought that you understood the above and were suggesting that the Activity window should describe the process as something other than “Indexing,” since to you that implied re-indexing.

Could what be related?

I must be misunderstanding something. When I see a process as red and “Not Responding” in Activity Monitor for very long, I assume it has crashed. No “crash reporter window” comes up for eftexttool.

I’ve attached a sample from one instance of eftexttool.

Well, it’s always very slow and in a similar way. It never goes through groups of files any faster than usual. Perhaps I’ve been unlucky and the first randomly selected few hundred has always been almost entirely unindexed files because it’s always been slow enough to appear to be actively indexing every single file it hits.

Could the reason why it’s so slow every time it starts indexing be related to why eftexttool consistently hangs/crashes after a few minutes?

A hang is when a process stops responding to the mouse or keyboard (or, for processes like eftexttool without user interfaces, shows as “Not Responding” in Activity Monitor).

A crash is when a process performs an illegal operation, and the OS shuts it down immediately (removing it from Activity Monitor) and brings up the crash reporter window.

It looks like the OS is hanging while looking up some font information. Perhaps it would help to use a utility such as TinkerTool System to reset your font caches and then restart your Mac.

If that doesn’t stop the hangs, then the source of the problem may be a bug in the OS (rather than a damaged font cache), in which case it would help if you could send me some of the problem files so that I can try to reproduce the problem here and report it to Apple.

Since the files are scanned in random order, the indexed and unindexed files should be mixed together. Therefore, you shouldn’t see groups that go faster than others. The progress bar should move at roughly constant speed (unless there are hangs). Without a reference point for how long indexing those particular files should take, there’s no way for you to determine how many of them are unindexed.

Yes. If one or more of the workers is hung while reading a file, that means fewer workers will be actively making progress, so the progress will be slower. Secondly, EagleFiler updates the index file in batches (not after every file), so if you stop the indexing in the middle of a batch, some completed work will be discarded, meaning that those files will need to be read again next time.

Whatever the cause of the ATS hang, it sounds like you have lots of PDF files that trigger it.

I did that. No improvement.

How do I find that out? I’ve been indexing for two or three days straight now. It’s frankly getting ridiculous.

I’m thinking of giving up and starting over with a new library, adding PDFs in small batches like I did the first time. It didn’t work then either, but I don’t know what else to do at this point. Indexing PDFs shouldn’t be this hard - the MacOS has already done it twice (once with their first location, and again when EF moved them to a new location).

ATS? I don’t see that hanging, only the eftexttool hanging.

And even when it doesn’t hang, it takes for-ever.

As you’ve seen, if the indexing gets stuck and you cancel it, EagleFiler’s log shows which PDF files it was working on. If I can try some of those files here, I’ll have a better idea of where the problem is.

I don’t see why that would help. If the problem is with particular PDF files, you’ll run into exactly the same problem indexing them in another library.

Your sample log showed that eftexttool was waiting for ATS, e.g. it had called ATSGlyphGetName.

I had forgotten about that - I’ve been focused on not cancelling it in an effort to get it to actually finish.

I’ll send you the most recent files that it’s been hanging on. I can always send you more, since it never manages to go more than a few minutes before going red.

Neither do I, but nothing else has worked either.

Strange, then, that ATS never appears to hang and always appears to index the files for Finder.

I believe I’ve found the cause of the indexing hangs, which is related to the way EagleFiler handles the large number of errors that Mac OS X reports when reading certain PDF files. This will be addressed in the next version of EagleFiler. If anyone else is seeing hangs during indexing, please e-mail me.

This is fixed in EagleFiler 1.4.4.