C-Command Software Forum

Let user decide which duplicated files to keep

Hi,

Im a newbe, heard so much about Eaglefiler I decided to try it out.

I think the duplicated file feature is in itself already a super thing to improve the organization of your files.

But what strikes me is that when importing my project files to Eaglefiler it is surprising that EF decides FOR ME which duplicated files to keep and which ones not to keep!!

What’s more EF does not informes me where the duplicated files are or where before getting rid of them. So I have to search the files MANUALLY after import and see what to do.

In many other applications, for instance, if I use isync and the same contact is found, isync asks you which record you want to keep, since the application cannot know which record is important for you and which not.

Well in the case of EF I know the duplicated found files are identical but “I” should decide and not EF if I want to keep all or only one, and I should decide which one to keep and in which location (folder).

What I imagine is a pop up window by importing file informing me that duplicated files have been found and give me options what to do with them.

Or maybe I missed something and the required feature is already integrated?

thanks
Mario

If EagleFiler skips a file because it’s a duplicate, it will list the file in the Errors window. You can use the Reveal button to locate the file that was skipped and the “Reveal in Library” button to locate the copy that’s already in the library.

With EagleFiler, one of the files is already in your library and potentially filed and tagged. So I think it makes sense that this one is more important and should be kept.

Then perhaps you’d prefer to use the Allow duplicate files in library option so that everything is imported? You could always remove the duplicates manually later.

Could you describe what kind of files/folders you’re importing such that you want this level of control? It’s not a situation that I’ve come across myself or that anyone has mentioned in the last two years.

They’re exact duplicates
The original poster might be missing that when EagleFiler flags two files as duplicates, they’re exact duplicates—the file content is literally identical. EagleFiler can’t present understandable differences between arbitrary file formats, but it can determine “this is the same”.

Yes, but why EF has to decide that? why does it have to take so many steps what It could take just ONE, when I want e.g. to replace a file which already exists in the Library? The way EF does it, makes it unnecessary complicated in case the user wants something else?

With EagleFiler, one of the files is already in your library and potentially filed and tagged. So I think it makes sense that this one is more important and should be kept.

It makes sense what the user wants to keep if he can decide which one and not EF.

Then perhaps you’d prefer to use the Allow duplicate files in library option so that everything is imported? You could always remove the duplicates manually later.

Yes, I’ve seen that option, but again, it does to addresses what I’m trying to explain, in one word, the simplicity to choose in real time, when importing files, and not after.

Could you describe what kind of files/folders you’re importing such that you want this level of control? It’s not a situation that I’ve come across myself or that anyone has mentioned in the last two years.

I work with many different kind of projects and files.

Im amazed that noone has talked about this before, for me it just applying common sense that the user should be able always to decide what to do with the data and not the application.

I don’t think this feature would be very difficult to implement actually, and as I said, it would make it possible to do just in one step what it takes now several ones.

The original poster might be missing that when EagleFiler flags two files as duplicates, they’re exact duplicates—the file content is literally identical. EagleFiler can’t present understandable differences between arbitrary file formats, but it can determine “this is the same”.

Hi,

if you read my first post, you’ll see that the original poster (me) didn’t miss that point.

By the way, I have noticed EF does also do mistakes comparing files (well at least in my case is so), not letting a file to be imported when actually the file is different to the existing one in the library.

E.g. EF does not import a .wav file, saying it is the same as a Toast image, what actually makes no sense.

I’ve never seen EagleFiler consider two files to be identical unless they actually were. It uses MD5 checksums, which were developed for cryptography, to determine uniqueness so it should be astronomically unlikely for two different files to be considered the same.

If the files have reasonable sizes, could you e-mail them to me or post them somewhere so I could take a look?

Perhaps both files are empty?

EagleFiler is designed to streamline the common case—giving preference to the file that’s already in the library—so that it takes zero steps. The alternative would be to present a dialog box each time a duplicate is encountered, which most users wouldn’t want to see.

Would you care to describe some examples of situations where you would want to keep the existing file vs. the new one? In theory, I agree that the user should always be able to decide what to do, but in practice if you provide too many options the software becomes so complicated that most people don’t want to use it, and the code becomes more difficult to maintain and keep bug-free. So my aim is to provide only the options that a significant number of people will use.

This is a pet peeve of mine, but please don’t speculate as to how difficult you think something would be. The fact that you think (not having seen the code) that it would be easy is not going to affect whether I decide to do it. And things are often much more complicated (or, rarely, easier) than they appear. For example, in this case:

  1. It’s not just a matter of presenting the user two choices like with iSync. There could be any number of identical files that are in the library or are in the process of being imported.
  2. These files may have different metadata, so you’d probably want to be able to examine that when choosing.
  3. Each import operation is multi-threaded, and there can be multiple import operations active at a time.
  4. Import operations can be stopped by the user.
  5. The import might have been invoked by a script, in which case user interaction might not be desired.

It takes zero steps as far as you accept what EF decides.

The dialog box is the alternative I’m talking about which actually is used in most applications which confront a similiar situation, going from the finder itself to any file management or DAM and so on. It looks like what Im saying sounds unlogical, but actually EF is the only application I know that do it the other way around.

Would you care to describe some examples of situations where you would want to keep the existing file vs. the new one?

The point is not what or what I do not want to keep, the point is letting the user decide what to do with his/her files and not EF:

  • different metadata (don’t know jet how exactly EF handles metadata)
  • different file names
  • have the same file in different folders with different names
  • have the same file in different folders because they are temporal projects
  • I’m a web designer and many times files are identical in different projects.

I know it can be done with extra steps, but when you work with thousand of files these extra steps for a long list of duplicated files in the error window, becomes really… a loooong task.

In theory, I agree that the user should always be able to decide what to do, but in practice if you provide too many options the software becomes so complicated that most people don’t want to use it, and the code becomes more difficult to maintain and keep bug-free. So my aim is to provide only the options that a significant number of people will use.

Of course, you are the programer and base on your experience and user feedback you choose what to implement and what not. Since you have a lot to do.

The user could also choose if he/she wants the duplicate file dialog to appear or not, but I think most users would want use it, since it would make their lifes a lot easier.

This is a pet peeve of mine, but please don’t speculate as to how difficult you think something would be. The fact that you think (not having seen the code) that it would be easy is not going to affect whether I decide to do it. And things are often much more complicated (or, rarely, easier) than they appear.

Of course, you are right, from outside everything looks easy, but every little change represents a big amount of work, which most users do not really appreciate. I didn’t mean that, sorry if my message came wrongly.

For example, in this case:

  1. It’s not just a matter of presenting the user two choices like with iSync. There could be any number of identical files that are in the library or are in the process of being imported.

In this case it would help the user to have a wider overview, of what is going on with all the files, don’t you think so?

  1. These files may have different metadata, so you’d probably want to be able to examine that when choosing.

Well, this is one of the main points why EF should NOT choose what to do, since it cannot decide which metadata is more relevant.

  1. Each import operation is multi-threaded, and there can be multiple import operations active at a time.

Well, the dialog could appear when the import process is finished (as isync) or as the error window itself.

  1. Import operations can be stopped by the user.

Let it do it when import is finished (as isync)

  1. The import might have been invoked by a script, in which case user interaction might not be desired.

Again, Let it do it when import is finished (as isync) or the own error window of EF.

Thanks for sending the files. As I suggested above, the three files are all empty (and, thus, identical). They each have no data in the data fork. The .wav file is completely empty. The old-style font suitcase and the Toast disk image each have small resource forks. So you would need to turn on Allow duplicate files in library in order to import these files.

The next version of EagleFiler will change the behavior slightly, so that files with empty data forks do not count as duplicates.

Thanks for sending the files. As I suggested above, the three files are all empty (and, thus, identical). They each have no data in the data fork. The .wav file is completely empty. The old-style font suitcase and the Toast disk image each have small resource forks. So you would need to turn on Allow duplicate files in library in order to import these files.

Michael, thanks for the replay.

The Toast file works perfectly (you can open it in Toast) and it has nothing to do with an empty wav file or a Font file which actually works as well. I have found a couple more of these behaivours but didn’t want to overwhelm you with more emails.

So for me it is a faulty behaivour of EF saying these two files are duplicated.

The next version of EagleFiler will change the behavior slightly, so that files with empty data forks do not count as duplicates.

Maybe the user should be warm that the file to be imported is actually empty…

It sounds like you didn’t understand my reply. Files on Mac OS X can contain two forks of content, the data fork and the resource fork. Except for very old files (like font suitcases from Mac OS 9) the resource fork is only supposed to be used for metadata (e.g. which application is set to open a file), so it should not be considered when determining whether two files are duplicates. Your wav file really is empty. The font and Toast files are not empty, but they store their contents in the resource fork instead of the data fork, so EagleFiler sees that the data fork is empty (and identical). Probably all your examples are the same.

Yes, that’s why I’m making the change that I described.

Why? In most cases, the files will work fine (as you said).

I’m not aware of any applications that present the same situation as EagleFiler, dealing with actual files that are exact duplicates. iPhoto is probably the closest, but even there it doesn’t give you the option to delete the photo that was already in your library. And most people just check the box to skip the photos that had already been imported.

Anyway, there’s no point in arguing what I or you think is logical. I’m interested in (a) how you think the dialog box(es) for resolving duplicates should work, and (b) whether other people would be interested in such a feature.

Yes, I understand why you want this. I’m interested in specific examples so that, if I were to add such a dialog box, it would be well-suited to the kind of work that you do.

Yes. I’m just saying that this is much more complicated that what you originally suggested and what iSync does.

Usually, the new file being imported will not have any metadata.

If it’s done that way, EagleFiler would have to copy potentially a large amount of data into the library, only to delete it later. Some people deal with large files or large numbers of files, and this could be prohibitively slow and/or exhaust the available disk space.

What happens if the import doesn’t finish, because it was cancelled? Should the imported duplicates be left in the library? Should it bring up a dialog box when you click cancel? (That seems weird.)

In some cases, that would be entirely the wrong thing to do.

As of EagleFiler 1.4, these files are no longer treated as duplicates.