Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 86058 invoked from network); 26 Mar 2009 02:36:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Mar 2009 02:36:22 -0000 Received: (qmail 83459 invoked by uid 500); 25 Mar 2009 22:40:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83437 invoked by uid 500); 25 Mar 2009 22:40:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83110 invoked by uid 99); 25 Mar 2009 22:40:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2009 22:40:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com designates 74.125.46.29 as permitted sender) Received: from [74.125.46.29] (HELO yw-out-2324.google.com) (74.125.46.29) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Mar 2009 22:40:08 +0000 Received: by yw-out-2324.google.com with SMTP id 9so185630ywe.5 for ; Wed, 25 Mar 2009 15:39:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=JoYdMwkjleODaImfflD7TIxCxbgng3RUV2Fwc7GUfyY=; b=Qx5W9x1B0ik+mXvqf8VCTKx7sFC9izATQb4j3cLCEQpckdftMDe2U+dvd22wpzvKfo VKpHjdIdoRsEt/2dl/vqFf5FvbnFFkYj1XSDvd/1OedSNIDAa572iVVtSEkdAYD0wFcU O91WL+Zjc6ma29+7E1iu63xu82ExNkuzdobOE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=flpZqVd6hHE9GW/ipXuOnIFvP1xJDcHZuhAHwuBMW19oJSZTcp7X+Viv1xXCa9enWc /pgjCuguaMCVH5NPUtTn44GoUzQfpr0TesMpvA8EFDjKoY0dJ1gWL//wHEROGwNwzmcC 8Fr3Wa88i4Y/FYwBv5opRk0VemBk/nScchWgs= MIME-Version: 1.0 Received: by 10.151.157.1 with SMTP id j1mr405479ybo.192.1238020786699; Wed, 25 Mar 2009 15:39:46 -0700 (PDT) In-Reply-To: <85d3c3b60903251306y23516d05scd5b96adabe68732@mail.gmail.com> References: <85d3c3b60903241101k5927ac5el4419e01eb50f4504@mail.gmail.com> <9ac0c6aa0903241225o47fdfab4hfef69eea2573018e@mail.gmail.com> <85d3c3b60903241411o3fe3d073r3ac34afad92bdb8a@mail.gmail.com> <9ac0c6aa0903241443p1c57e1c9i363505aaa8e1d105@mail.gmail.com> <85d3c3b60903251306y23516d05scd5b96adabe68732@mail.gmail.com> Date: Wed, 25 Mar 2009 15:39:46 -0700 Message-ID: <85d3c3b60903251539s528a31d2yade6a9beead3affa@mail.gmail.com> Subject: Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4 From: Jason Rutherglen To: java-user@lucene.apache.org Content-Type: multipart/mixed; boundary=00151750dda43741ce0465f92d4c X-Virus-Checked: Checked by ClamAV on apache.org --00151750dda43741ce0465f92d4c Content-Type: multipart/alternative; boundary=00151750dda43741b70465f92d4a --00151750dda43741b70465f92d4a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit LuceneError when executed should reproduce the failure. The contrib/benchmark libraries are required. MultiThreadDocAdd is a multithreaded indexing utility class. On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen < jason.rutherglen@gmail.com> wrote: > Each document is being created in a single thread, and the fields of the > document are not being updated elsewhere. I haven't posted the full code > yet as it needs to cleaned up. Thanks Mike! > > > On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless < > lucene@mikemccandless.com> wrote: > >> It looks like you are reusing a Field (the f.setValue(...) calls); are >> you sure you're not changing a Document/Field while another thread is >> adding it to the index? >> >> If you can post the full code, then I can try to run it on my >> wikipedia dump locally. >> >> Mike >> >> Jason Rutherglen wrote: >> > Mike, >> > >> > It only happens when at least 1 million documents are indexed in a >> > multithreaded fashion. Maybe I should post the code? I will try >> indexing >> > without the payload field, I assume it won't fail because I indexed >> > wikipedia before with no issues. >> > >> > Thanks! >> > >> > Jason >> > >> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless < >> > lucene@mikemccandless.com> wrote: >> > >> >> Hmmmm. >> >> >> >> Jason is this easily/compactly repeated? EG, try to index the N docs >> >> before that one. >> >> >> >> If you remove the SinglePayloadTokenStream field, does the exception >> >> still happen? >> >> >> >> Mike >> >> >> >> Jason Rutherglen wrote: >> >> > While indexing using >> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The >> >> > asserion error is from >> TermsHashPerField.comparePostings(RawPostingList >> >> p1, >> >> > RawPostingList p2). A Payload is added to the document representing >> a >> >> UID. >> >> > Only 1-2 out of 1 million documents indexed generates this error. >> >> > >> >> > java.lang.AssertionError >> >> > problem adding >> >> > >> doc:Document> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The >> '''Croatian >> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in >> >> [[Washington, >> >> > D.C.]] It is located on [[Embassy Row]] at 2343 [[Massachusetts >> Avenue >> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC >> >> > (northwest)|Northwest]] near [[Dupont Circle]]. Previously the >> building >> >> had >> >> > been home to the [[Austrian Embassy in Washington|Austrian embassy]], >> but >> >> > they left for larger quarters and sold the structure to Croatia in >> 1993. >> >> > The purchase and renovation of the building was largely paid for by >> the >> >> > [[Croatian-American]] community. In front of the embassy is a large >> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]]. >> >> > ==External link== *[http://www.croatiaemb.org/ Official site] >> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign >> relations >> >> of >> >> > Croatia]]> stored/uncompressed,indexed,tokenized> >> Croatia >> >> > in Washington> >> stored/uncompressed,indexed,tokenized> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms >> >> > >> >> >> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf >> >> > >> >> > indexed> ex: java.lang.AssertionError >> >> > at >> >> > >> >> >> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228) >> >> > at >> >> > >> >> >> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144) >> >> > at >> >> > >> >> >> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136) >> >> > at >> >> > >> >> >> org.apache.lucene.index.FreqProxFieldMergeState.(FreqProxFieldMergeState.java:51) >> >> > at >> >> > >> >> >> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202) >> >> > at >> >> > >> >> >> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132) >> >> > at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145) >> >> > at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74) >> >> > at >> >> > >> >> >> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75) >> >> > at >> >> > >> >> >> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60) >> >> > at >> >> > >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574) >> >> > at >> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533) >> >> > at >> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442) >> >> > at >> >> > >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922) >> >> > at >> >> > >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880) >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --00151750dda43741b70465f92d4a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable LuceneError when executed should reproduce the failure.=A0 The contrib/benc= hmark libraries are required.=A0 MultiThreadDocAdd is a multithreaded index= ing utility class.=A0

On Wed, Mar 25, 20= 09 at 1:06 PM, Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
Each document is = being created in a single thread, and the fields of the document are not be= ing updated elsewhere.=A0 I haven't posted the full code yet as it need= s to cleaned up.=A0 Thanks Mike!


On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <<= a href=3D"mailto:lucene@mikemccandless.com" target=3D"_blank">lucene@mikemc= candless.com> wrote:
It looks like you= are reusing a Field (the f.setValue(...) calls); are
you sure you're not changing a Document/Field while another thread is adding it to the index?

If you can post the full code, then I can try to run it on my
wikipedia dump locally.

Mike

Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
> Mike,
>
> It only happens when at least 1 million documents are indexed in a
> multithreaded fashion. =A0Maybe I should post the code? =A0I will try = indexing
> without the payload field, I assume it won't fail because I indexe= d
> wikipedia before with no issues.
>
> Thanks!
>
> Jason
>
> On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
> lucene@= mikemccandless.com> wrote:
>
>> Hmmmm.
>>
>> Jason is this easily/compactly repeated? =A0EG, try to index the N= docs
>> before that one.
>>
>> If you remove the SinglePayloadTokenStream field, does the excepti= on
>> still happen?
>>
>> Mike
>>
>> Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
>> > While indexing using
>> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMak= er. =A0The
>> > asserion error is from TermsHashPerField.comparePostings(RawP= ostingList
>> p1,
>> > RawPostingList p2). =A0A Payload is added to the document rep= resenting a
>> UID.
>> > Only 1-2 out of 1 million documents indexed generates this er= ror.
>> >
>> > java.lang.AssertionError
>> > problem adding
>> > doc:Document<stored/uncompressed,indexed,tokenized<body= :[[Image:Croatia,
>> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The &= #39;''Croatian
>> > Embassy in Washington''' is the [[embassy]] of [[= Croatia]] in
>> [[Washington,
>> > D.C.]] =A0It is located on [[Embassy Row]] at 2343 [[Massachu= setts Avenue
>> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>> > (northwest)|Northwest]] near [[Dupont Circle]]. =A0Previously= the building
>> had
>> > been home to the [[Austrian Embassy in Washington|Austrian em= bassy]], but
>> > they left for larger quarters and sold the structure to Croat= ia in 1993.
>> > The purchase and renovation of the building was largely paid = for by the
>> > [[Croatian-American]] community. =A0In front of the embassy i= s a large
>> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?tr= ovi?]].
>> > =3D=3DExternal link=3D=3D *[http://www.croatiaemb.org/ Official site]
>> > [[Category:Embassies in Washington|Croatia]] [[Category:Forei= gn relations
>> of
>> > Croatia]]> stored/uncompressed,indexed,tokenized<doctit= le:Embassy of
>> Croatia
>> > in Washington> stored/uncompressed,indexed,tokenized<do= cdate:29-JUN-2006
>> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<doc= id:1703107>
>> >
>> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePaylo= adTokenStream@e7b3cf
>> >
>> > indexed<id:667162>> ex: java.lang.AssertionError
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHas= hPerField.java:228)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerFi= eld.java:144)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPe= rField.java:136)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqP= roxFieldMergeState.java:51)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqPro= xTermsWriter.java:202)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWri= ter.java:132)
>> > =A0 =A0at org.apache.lucene.index.TermsHash.flush(TermsHash.j= ava:145)
>> > =A0 =A0at org.apache.lucene.index.DocInverter.flush(DocInvert= er.java:74)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.= java:75)
>> > =A0 =A0at
>> >
>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.= java:60)
>> > =A0 =A0at
>> > org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter= .java:574)
>> > =A0 =A0at org.apache.lucene.index.IndexWriter.doFlush(IndexWr= iter.java:3533)
>> > =A0 =A0at org.apache.lucene.index.IndexWriter.flush(IndexWrit= er.java:3442)
>> > =A0 =A0at
>> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.j= ava:1922)
>> > =A0 =A0at
>> > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.j= ava:1880)
>> >
>>
>> ------------------------------------------------------------------= ---
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org<= /a>
>> For additional commands, e-mail:
java-user-help@lucene.apache.org >>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



--00151750dda43741b70465f92d4a-- --00151750dda43741ce0465f92d4c Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --00151750dda43741ce0465f92d4c--