Characters that can be the target of a search

dmag · April 2, 2011, 11:22am

Hello…

I may have missed this in the documentation, but:

(1) What characters are indexed and can be the target of a search? For example, I’ve found that I can search for a “$” but not a “/”.

(2) What characters constitute a word boundary? For example, in TextEdit “860.555.1212” is one word, whereas “(860)555-1212” is three words.

Also, a couple suggestions:

(a) Consider adding the ability to search only in the record document and not in any of the associated metadata.

(b) Consider providing the ability to refer to a parent tag rather that each individual sibling tag. For example, if I had a “Status” tag with 10 children, I’d like to be able to create a Smart folder for items that had no Status assigned, without explicitly listing the 10 child tags.

Regards,
David

Michael_Tsai · April 2, 2011, 11:35am

This is subject to change, but the current list is alphanumeric characters, plus “#$@%_”. I’m curious why you want to search for “/” as I’ve always seen that as a separator between words.

For the purposes of text editing and selection (e.g. double-clicking), EagleFiler works like TextEdit. For the purposes of indexed searching, any non-word character is seen as a boundary.

Could you explain a bit about how that would be useful to you?

This is indeed under consideration for a future release.

dmag · April 2, 2011, 4:10pm

I was classifying imported pdf documents, and wanted to find instances where a year appeared in a date (e.g. “/2005”) rather than independently in the text.

(Searching for text in a pdf can be tricky, as the underlying text doesn’t faithfully represent white space. If you clip a table, columns in the clipping are separated by a single space, which means you can’t distinguish a column separator from a word separator. An OCR’d document can be considerably worse, because the underlying text may not reflect what you see. You see “Tomato” and it sees “To mato”. )

So, $210.53 is two words? And a search for “210.53” would also find “53 Main Street Apt 210”?

(restricting search to only the document)
Say I scan or download a document from 2007. My scanning software is set up to use date-time as a file name (e.g. 2011.04.02_18.51.00). A downloaded document would have 2011 in the file creation date. If I want to find documents that have “2011” in the text itself, these two documents will show up in the results because they have “2011” as part of the file name or date.
You’d search for “2011”, and a document would be returned with nothing highlighted, because the “2011” isn’t in the document, but in the metadata.
In short, I’d like this option so I could focus a search, just as I could focus a search on “Tags” or “From”.

Michael_Tsai · April 3, 2011, 7:49am

OK, thanks. For this sort of thing, it seems to me that it would be better to be able to do an exact search, rather than play around with the definition of words for index searches.

Yes.

If you search for “210.53” including the quotation marks, it will search for the word “210” immediately followed by the word “53”. This would find “210.53” but not “53 Main Street Apt 210” or “210 Main Street Apt 53”.

Thanks for explaining. I will consider whether there’s a good way to do this for a future release.

dmag · April 4, 2011, 4:25pm

(searching for a string that includes a punctuation character)

I agree – but you can only perform an exact search on the metadata, not on the document itself, correct?

Michael_Tsai · April 4, 2011, 5:16pm

You can do exact searches of a particular document via the Find panel, but currently (unless you use AppleScript) you can only do exact searches across documents of the metadata.