Initial import of 150,000 e-mails

Michael_Tsai I am about to embark on a similar project - first Eagle Filers user here. Any recommended best methods for initial import of over 150,000 emails? I had some mbox files saved but for some reason I had some trouble, and right now I am downloading everything via Apple Mail and thinking of trying the native methods instead…

My intuition is that any method can cause fits if I throw too much at it at once, or no? I am looking at about 15gb in one gmail account, and then about the same amount in a second one. I am hoping to get all of this in EagleFiler from now on, and to periodically move things from these servers in the future and not let them get so big…

I am very open to recommended practices, both for this initial one-time (all time_) data dump, and then also how to proceed periodically going forward.

I’m not sure what trouble you had with mbox files, but importing mbox files to EagleFiler is generally the fastest way.

The amount of data (GB) doesn’t matter so long as your Mac has enough disk space.

You can also import directly from Mail using the capture key, though depending on your Mail settings and how much space is available on your Mac, Mail may not download all the attachments, and so EagleFiler may warn you that certain messages need to be redownloaded. Also, Mail may not be able to handle a selection of 150K message all at once, though batches of 50K should definitely work.

There’s more information about importing e-mails here.

Thanks so much! I kept trying different things to import an mbox file with over 110,000 emails on it with little success (I honestly don’t remember my various steps). I finally just now decided to implement your suggestion of breaking it up into chunks via smart mailboxes in Apple Mail. So, I just started with a smart mailbox selecting all emails from 2009 - 2012 (around 19.5K emails). I pressed F1 and presto! Eagle Filer is importing. I think I will do about this sized chunk at a time so I can learn and see what’s going on and not overwhelm things. I think I will want to merge after it’s all done so the end result is the one Gmail AllMail mbox file in Eagle filer? (instead of 5-6 identically named folders). I know you have documentation on that in the manual, just have never done it until now.

Do you mean an actual mbox file from Finder, or were you importing from Mail? I can’t see how a plain file import would fail since all it does is clone the file into EagleFiler’s folder…

Sounds good.

Yeah, merging probably makes sense for organizational purposes.

I honestly don’t recall. I have had multiple variables going, and I probably frankly just got impatient and did not appreciate how Eagle Filer worked the first time and killed it. I have been debating workflow, and also realized I was unhappy with the state/hierarchy of the particular mbox files I was tentatively starting the import on. I doubt it was a problem with Eagle Filer, but rather user error/impatience/ignorance.

oh - it was an mbox file at first. I do at least re ember that. But again, I just got impatient and killed the process. Sorry to have suggested that Eagle Filer misfired somehow. I don’t believe that to be the case.

Update - wondering if you could shed light (I am still in amazement that I can talk to the developer directly!) I just completed importing my first batch of about 20k emails (I selected the first three years of my gmail Archive folder (2009-2012). I left it alone, ran some errands, and watched it. I opened the activity window to watch the indexing detail (it initially estimated around 60 minutes, but I think it took closer to twice that maybe?). I watched as it finished up (i had opened a couple of emails during this process, and tried a couple of searches).

When it was done, I closed Eagle Filer and reopened it. Whereupon it immediately reported that it was indexing again (with an estimate of at first 5 hours, and right now it’s at 3 hours and 12 minutes).

Is this normal behavior? Should 20k emails (with attachments here and there) – about 2.2 GB worth — take 4-5 hours to import and index? Just checking. I am committed at this point to let the process work out, but I am a little confused why the indexing process showed initially that it was completely done, only to restart much longer upon closing and reopening the application.

I would expect 20K e-mails to index in a few minutes, although it will take longer if there are lots of attached files or you have a spinning hard drive.

It’s normal that when you re-open a library it will do an indexing pass, but normally this just takes seconds to determine that nothing has changed since the last indexing. It only does real work there if you’ve modified the files outside of EagleFiler or if the library was closed before it finished the initial indexing. If you copy/sync the files to another Mac or restore them from backup, that is also seen a modifying the files, as they are new on disk. One of the benefits of using mailbox files is that there is only one file to check per mailbox, not per message. So the indexing scan should be very quick.

So, something sounds off here. The Activity window should give you some information about what is being indexed (which mailbox, how many messages, whether it seems to be making progress). There’s more information on this page about troubleshooting, such as how you can enable diagnostic logging to show precisely which items are being indexed and how long that is taking.

Possible I closed it out too soon, although it seemed to have indicated it was finished.

I am doing this on a M2 Mac Mini with just a 256 GB SSD, but I have a 2 TB external (“spinning”) hard drive that I was hoping to house my Eagle Filer library on to save space. Apple Mail is on the internal SSD, but the Eagle Filer library is (for now) on the external drive.

Do you think I would get drastically better performance if I put the Eagle Filer library on the internal drive, or at least an external SSD? (I figured, but got a little lazy and used my existing external drive; I bought an SSD but it is a portable one and I read those were not the best to use…)

I personally have a large mail archive on an external laptop spinning hard drive. It’s fine, just slower for the initial indexing, and you might not want all 150K merged together. I doubt the hard drive is the issue.

I think you need to look more closely at what’s happening: Did it actually finish? If not, did it quickly find the place to resume or seem to start at the beginning and proceed at the same speed as the first time? What does the logging show? If you need help figuring it out, please let me know.