This is a rather strange request, driven by speed issues in searching. EF searches are very fast, but I am OCR’ing a number of old documents, and fear that searches are starting to slow down. I’m wondering if a solution is to not include certain documents in the indexes.
DB description: I have ~5000 old aircraft manuals and related documents in PDF format, totaling 45 GB (large size is due to very inefficient scanning of some of them). Many manuals are several hundred pages. Because of poor image quality, the OCR of many of the old documents has a lot of nonsense words and spelling errors. I wonder if this increases the size of the index - presumably every pseudo-word has to be in the index. I am also indexing by phrases, which increases index size.
The result seems to be that as I OCR more of these old documents, searches are getting slower. The effect is not bad yet, but I’m thinking ahead.
One workaround would be to exclude these crappy old OCR’d documents from indexing. Another solution would be to not OCR them; but when I work with an individual document, I want to search it, copy from it, etc. A third solution would be to create a separate database for these old manuals, so that the rest of my database does not take a performance hit.
I don’t know any way to tell EF to not index certain documents. Does anyone have suggestions?
I realize this is a very special-purpose request, and creating another database is always a feasible solution.