Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 98308 invoked from network); 24 Aug 2010 16:19:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Aug 2010 16:19:23 -0000 Received: (qmail 1585 invoked by uid 500); 24 Aug 2010 16:19:22 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 1281 invoked by uid 500); 24 Aug 2010 16:19:21 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 1270 invoked by uid 99); 24 Aug 2010 16:19:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 16:19:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Aug 2010 16:19:17 +0000 Received: by qwk3 with SMTP id 3so7815597qwk.35 for ; Tue, 24 Aug 2010 09:18:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=x78QwhaUGt02SeC9bRpXgzuYXfSKCRD8+BtLz1LTfYI=; b=SblLYyzq9B5tJbrLNtYK+KDkyhIfFdNW1u/q/3kMURFFsZHWPvc4LcnnFSFQ3rl9Uh VJK/2PPLYu3KvLpZCu31EaLv5+elQpfzsdturvzO0olahiIqrhQ3zPkkCNiQLdIMxVy4 Tn3bPOzYK7cevcUA610+jAL4JZXt323sOBGm0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=QmcY5wJIQpzElag4FS5H2nyPUg22zw9ox/Q/QGQMP4PYFaWIr0xKBcw9WoP8PIcq6W CPWBvND88qITCBJM9aZhv0JflwwU9KCIRTN45b0Ngl5msIyuqc1DHy7E5UO7ThA6cgad 5/VlAhjfMZL8CXCE+OE1cOHWFvdjuBdn00Xjw= Received: by 10.224.101.10 with SMTP id a10mr4558596qao.230.1282666732106; Tue, 24 Aug 2010 09:18:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.11.135 with HTTP; Tue, 24 Aug 2010 09:18:32 -0700 (PDT) In-Reply-To: References: From: Robert Muir Date: Tue, 24 Aug 2010 12:18:32 -0400 Message-ID: Subject: Re: Should analysis.jsp honor maxFieldLength To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=00c09fa00143eeb36d048e941e35 --00c09fa00143eeb36d048e941e35 Content-Type: text/plain; charset=UTF-8 On Tue, Aug 24, 2010 at 12:03 PM, Eric Pugh wrote: > Hi all, > > I have maxFieldLength set to 10000 in solrconfig.xml, but was playing > around with really large document (The King James Bible) in analysis.jsp. > I hacked analysis.jsp to show me the number of terms at each filter, and the > headers, but without turning everything on by checkboxing verbose. > > My results shown at this screenshot: > http://img.skitch.com/20100824-t36rq45i2wfimwyd53gwiqebdy.png seem to > confirm that maxFieldLength is NOT honored by the analysis.jsp. > > Separate from whether or not analysis.jsp should do this (I happen to think the closer to "reality" it is, the better), I think the easiest implementation would be to wrap the entire stream with LimitTokenCountFilter: /** * This TokenFilter limits the number of tokens while indexing. It is * a replacement for the maximum field length setting inside {@link org.apache.lucene.index.IndexWriter}. */ If i remember, its not exactly the same as the maxFieldLength, but its pretty close. -- Robert Muir rcmuir@gmail.com --00c09fa00143eeb36d048e941e35 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Tue, Aug 24, 2010 at 12:03 PM, Eric Pugh = <ep= ugh@opensourceconnections.com> wrote:
Hi all,

I have maxFieldLength set to 10000 in solrconfig.xml, but was playing aroun= d with really large document (The King James Bible) in analysis.jsp. =C2=A0= I hacked analysis.jsp to show me the number of terms at each filter, and t= he headers, but without turning everything on by checkboxing verbose.

My results shown at this screenshot: http://img.skitch.com= /20100824-t36rq45i2wfimwyd53gwiqebdy.png seem to confirm that maxFieldL= ength is NOT honored by the analysis.jsp.


Separate from whether or not analysis.= jsp should do this (I happen to think the closer to "reality" it = is, the better), I think the easiest implementation would be to wrap the en= tire stream with LimitTokenCountFilter:

/**
=C2=A0* This TokenFilter limits the numbe= r of tokens while indexing. It is
=C2=A0* a replacement for the m= aximum field length setting inside {@link org.apache.lucene.index.IndexWrit= er}.
=C2=A0*/
=C2=A0
If i remember, its not exactly the= same as the maxFieldLength, but its pretty close.

--
Robert Muir
rcmuir@gmail.co= m
--00c09fa00143eeb36d048e941e35--