lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Should analysis.jsp honor maxFieldLength
Date Tue, 24 Aug 2010 16:18:32 GMT
On Tue, Aug 24, 2010 at 12:03 PM, Eric Pugh <epugh@opensourceconnections.com
> wrote:

> Hi all,
>
> I have maxFieldLength set to 10000 in solrconfig.xml, but was playing
> around with really large document (The King James Bible) in analysis.jsp.
> I hacked analysis.jsp to show me the number of terms at each filter, and the
> headers, but without turning everything on by checkboxing verbose.
>
> My results shown at this screenshot:
> http://img.skitch.com/20100824-t36rq45i2wfimwyd53gwiqebdy.png seem to
> confirm that maxFieldLength is NOT honored by the analysis.jsp.
>
>
Separate from whether or not analysis.jsp should do this (I happen to think
the closer to "reality" it is, the better), I think the easiest
implementation would be to wrap the entire stream with
LimitTokenCountFilter:

/**
 * This TokenFilter limits the number of tokens while indexing. It is
 * a replacement for the maximum field length setting inside {@link
org.apache.lucene.index.IndexWriter}.
 */

If i remember, its not exactly the same as the maxFieldLength, but its
pretty close.

-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message