Remove Duplicate messages - timeout

Hi,
I have huge mailboxes that I need to clean up.
For example over 37K emails close to 1Gb - I run the script Remove Duplicate Messages (Remove Duplicate Messages - EagleFiler AppleScripts), however, it always times out.
MacBook Pro M2 w 96GB Ram and 4TB SSD - doubt that machine is a problem

Any help is greatly appreciated
Thank you so much!

How long does it run before timing out? It’s set to run for up to 24 hours, and for 37K messages I would expect it to finish within just a few minutes. So I wonder if it’s timing out in an unexpected place. If you run the script from within Script Editor, you can use View ‣ Show Log, and perhaps then it will show you which command timed out and give us a clue as to what’s happening.

It runs for about 30 min or so

thast all it shows
error “AppleEvent timed out.” number -1712

Now i am trying to extract 10k messages and run the duplicate remover in a separate folder. i am not sure if that will even work
Takes a long time, but maybe it will work

You may need to click on Events at the bottom of the window before you click Run.

What do you mean by “extract”?

Events are empty -

Extract - created a separate folder and moving the messages over to it. then will run a duplicate in a new folder
’
Have a better idea or maybe how to do it faster?

Oh maybe a novel idea - run the script only on “selected” emails? is that even possible?

Events should not be empty if you open that part of the window before running the script. I just tested this on my Mac. You can also click the clock icon at the right to see the events and replies in the separate Log History window.

Separating the mailboxes into per-message files will be slower. Also, the script in question operates on mailboxes rather than on individual messages. There’s a separate script for removing duplicate files.

No, it operates on the whole mailbox file. But it should normally be plenty fast and not time out, so I think we just need to figure out what’s causing the unexpected timeout.

When I started the process at 9:50 AM this is what I


saw


tell application “EagleFiler”

get current records of browser window 1

get universal type identifier of library record id 1680 of library document “Untitled.eflibrary”

get file of library record id 1680 of library document “Untitled.eflibrary”

get filename of library record id 1680 of library document “Untitled.eflibrary”

end tell

tell current application

do shell script “mktemp -d -t ‘EFRemoveDuplicateMessages’”

end tell

tell application “EagleFiler”

path to current application

end tell

tell current application

do shell script “cat ‘/Users/dimka/Downloads/Archive_Documents/Eagle_Filer/Untitled/Files/9A1C2037-93A2-4BEF-BDE8-ED663B8F25EE-1/[Gmail]/All Mail’ | perl -p -e ‘s/\r\n/\n/g’ | perl -p -e ‘s/\r/\n/g’ | /Applications/EagleFiler.app/Contents/Frameworks/WashFramework.framework/Versions/A/formail -b -e -q- -Y -D 104857600 ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.FwNy4Hlq/idcache’ -s > ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.FwNy4Hlq/NewMailbox.mbox’ 2> ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.FwNy4Hlq/Log.log’”


@Michael_Tsai
at 10:18 AM
Hmmm… now it came back and told me no duplicates were found.
I stopped everything else on the machine including usb-c backup drive (literally turned it off)

Is it possible that because its Google mail - it’s not working?

Well, this looks good that it didn’t time out. Perhaps there really were no duplicates? It looks by Message-ID so messages that are similar do not count as duplicates unless they are actually two copies of the same message.

you don’t think those are duplicates?
because they have different # eml?
I only see 5 unique messages

@Michael_Tsai Still timeout for some reason. here is the event log

tell application “EagleFiler”

get current records of browser window 1

get universal type identifier of library record id 12811 of library document “Untitled.eflibrary”

get file of library record id 12811 of library document “Untitled.eflibrary”

get filename of library record id 12811 of library document “Untitled.eflibrary”

end tell

tell current application

do shell script “mktemp -d -t ‘EFRemoveDuplicateMessages’”

end tell

tell application “EagleFiler”

path to current application

end tell

tell current application

do shell script “cat ‘/Users/dimka/Downloads/Archive_Documents/Eagle_Filer/Untitled/Files/mbox’ | perl -p -e ‘s/\r\n/\n/g’ | perl -p -e ‘s/\r/\n/g’ | /Applications/EagleFiler.app/Contents/Frameworks/WashFramework.framework/Versions/A/formail -b -e -q- -Y -D 104857600 ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.mug4GBee/idcache’ -s > ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.mug4GBee/NewMailbox.mbox’ 2> ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.mug4GBee/Log.log’”

do shell script “grep -c "^formail: Duplicate key found:" ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.mug4GBee/Log.log’”

do shell script “grep -vEc "^(\s*<|formail: Duplicate key found:)" ‘/var/folders/_1/_kqj_6gd251068rs1pg5f88r0000gn/T/EFRemoveDuplicateMessages.mug4GBee/Log.log’”

end tell

tell application “Script Editor”

display alert “No Duplicates Found” message “There were no duplicate messages in “mbox”.” buttons {“Cancel”, “OK”} cancel button 1

Result:

error “AppleEvent timed out.” number -1712

Those are .eml message files. The script that you are using is for removing duplicates within a single mailbox file. To remove duplicates among a group of selected .eml files you would need this script.

@Michael_Tsai
what about in here? i am running the RDS on that mbox and yet it still shows me -No duplicates.

@Michael_Tsai
I dug a little deeper inside the RAW and it looks like the MessageID’s of the above are Different for each line (have no clue how), but when I look inside the rest of the message they are the same.
So I figure the script needs to disregard MessageID and then just compare From, Subject, and inside the text to be the same.
Unfortunately, I don’t know the script or perl to comment out the MessageID check and add inside text comparison. OR. check for MessageID AND but after that, in addition, check if the text is the same

I don’t think there’s a way to modify the script to work that way. The Message-ID is core to how the formail tool works. Perhaps I can add this sort of duplicates removal as an option in a future version of EagleFiler.

1 Like

Tha would be awesome. Thank you!