lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe MA" <mrj...@comcast.net>
Subject RE: MaxFieldLength in Lucene 3.4
Date Thu, 01 Dec 2011 08:23:56 GMT

> "of course all other analyzers are unlimited"

Maybe I am too far behind the times.  I was updating some pretty old stuff.
I think it was written originally with Lucene 1.4.  I seem to recall that
Lucene v1.x had analyzers where the default was "limited", because I learned
pretty early that I had to set that option during indexing.  Perhaps at some
point the switch was made to default unlimited.  Thanks your answer clears
it up.

One question - why even have this option now? Are things more efficient with
a limited token field?  If you know your data is 'bounded', should you
always limit the token field to improve performance?

Thanks!


-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Monday, November 28, 2011 2:41 AM
To: general@lucene.apache.org
Subject: RE: MaxFieldLength in Lucene 3.4

Hi,

The move is simple - LimitTokenCountAnalyzer is just a wrapper around any
other Analyzer, so I don't really understand your question - of course all
other analyzers are unlimited. If you have myAnalyzer with
myMaxFieldLengthValue used before, you can change your code as follows:
 
Before:
new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34,
myAnalyzer).setFoo().setBar().setMaxFieldLength(myMaxFieldLengthValue));

After:
new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, new
LimitTokenCountAnalyzer(myAnalyzer,
myMaxFieldLengthValue)).setFoo().setBar());

You only have to do this on the indexing side, on the query side
(QueryParser) just use myAnalyzer without wrapping. With the new code, the
responsibilities for cutting the field after a specific number of tokens was
moved out out the indexing code in Lucene. This is now just an analysis
feature not a indexing feature anymore.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Joe MA [mailto:mrjama@comcast.net]
> Sent: Monday, November 28, 2011 8:09 AM
> To: general@lucene.apache.org
> Subject: MaxFieldLength in Lucene 3.4
> 
> While upgrading to Lucene 3.4, I noticed the MaxFieldLength values on the
> indexers are deprecated.   There appears to be a LimitTokenCountAnalyzer
> that limits the tokens - so does that mean the default for all other
analyzers is
> unlimited?
> 
> Thanks in advance -
> JM



Mime
View raw message