lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: IndexWriter.MaxFieldLength.UNLIMITED at what price?
Date Thu, 10 Dec 2009 10:03:28 GMT
LIMITED is basically an insurance policy, protecting you from
accidentally indexing an immense document, leading to OOME.

It also protects you in case your analyzer is accidentally letting in
bogus terms (say, if you indexed a large exe file, or there was a
large base64-encoded attachment on an email message that you didn't
decode).

But, LIMITED is very bad for the user experience.  Users will
eventually catch on that your search engine is "buggy", and lose
trust.

I'd recommend always using UNLIMITED unless you're in a domain where
there are risks of getting massive docs.  And even then I'd first try
to create other mechanisms to try to not index such documents...

Mike

On Thu, Dec 10, 2009 at 3:15 AM, Rob Staveley (Tom)
<rstaveley@seseit.com> wrote:
> I was wondering where I might read about the cost of using
> IndexWriter.MaxFieldLength.UNLIMITED versus
> IndexWriter.MaxFieldLength.LIMITED.
>
>
>
> Are thee any consequences over and above the obvious one that you are going
> to analyse more content in your IndexWriter when you have more than 10,000
> characters in a StringBuffer?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message