Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 66346 invoked from network); 20 Jan 2004 00:32:17 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 20 Jan 2004 00:32:17 -0000 Received: (qmail 11912 invoked by uid 500); 20 Jan 2004 00:31:57 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 11879 invoked by uid 500); 20 Jan 2004 00:31:57 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 11846 invoked from network); 20 Jan 2004 00:31:56 -0000 Received: from unknown (HELO c000.snv.cp.net) (209.228.32.64) by daedalus.apache.org with SMTP; 20 Jan 2004 00:31:56 -0000 Received: (cpmta 3327 invoked from network); 19 Jan 2004 16:32:04 -0800 Received: from 24.51.109.181 (HELO ?192.168.1.102?) by smtp.hatcher.net (209.228.32.64) with SMTP; 19 Jan 2004 16:32:04 -0800 X-Sent: 20 Jan 2004 00:32:04 GMT Mime-Version: 1.0 (Apple Message framework v609) In-Reply-To: References: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Message-Id: <0EFC90D7-4AE0-11D8-BB77-000393A564E6@ehatchersolutions.com> Content-Transfer-Encoding: quoted-printable From: Erik Hatcher Subject: Re: Unexpected end in indexing HTML file Date: Mon, 19 Jan 2004 19:31:59 -0500 To: "Lucene Users List" X-Mailer: Apple Mail (2.609) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Jan 19, 2004, at 7:27 PM, Syr=E9n Per wrote: > Hi all, > > Have a question concerning indexing of HTML files. > > One of the files I'm trying to index have a =20= > tag > that also contain a call to a javascript with a string argument that = is > about 1300 characters long. At this point Lucene seems to stop=20 > indexing the > remaining part the current document, but do index the other files in=20= > the > same directory. > > How do I workaround this? Seems unlikely, but IndexWriter.maxFieldLength is set to 10,000. This=20= is 10,000 terms maximum per field. Is it possible you are exceeding=20 that? What symptoms lead you to believe it is stopping indexing at that point? Erik --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org