lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4
Date Thu, 26 Mar 2009 18:08:42 GMT
I used the NoMergePolicy to build the index as I noticed the indexing is
faster, meaning the system simply creates large multi-megabyte segments in
the ram buffer, flushes them out and doesn't worry about merging which
causes massive disk trashing.  I am pondering some benchmarks to find the
optimal merge policy for realtime search, I'm not sure it's always necessary
to merge according to the Log system.

For example, a merge policy that caps the size of each segment at 250
megabytes, and does no merging could be interesting for realtime where many
deletes are coming in and the segments with enough deletes need to merged
away in 1-2 hours.  Meaning optimizing may not be best as it requires later
large merges.  Also an interleaving system that does not perform merges if a
flush is occurring could useful for minimizing disk trash.

On Wed, Mar 25, 2009 at 3:39 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> LuceneError when executed should reproduce the failure.  The
> contrib/benchmark libraries are required.  MultiThreadDocAdd is a
> multithreaded indexing utility class.
>
> On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
> jason.rutherglen@gmail.com> wrote:
>
>> Each document is being created in a single thread, and the fields of the
>> document are not being updated elsewhere.  I haven't posted the full code
>> yet as it needs to cleaned up.  Thanks Mike!
>>
>>
>> On Tue, Mar 24, 2009 at 2:43 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> It looks like you are reusing a Field (the f.setValue(...) calls); are
>>> you sure you're not changing a Document/Field while another thread is
>>> adding it to the index?
>>>
>>> If you can post the full code, then I can try to run it on my
>>> wikipedia dump locally.
>>>
>>> Mike
>>>
>>> Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
>>> > Mike,
>>> >
>>> > It only happens when at least 1 million documents are indexed in a
>>> > multithreaded fashion.  Maybe I should post the code?  I will try
>>> indexing
>>> > without the payload field, I assume it won't fail because I indexed
>>> > wikipedia before with no issues.
>>> >
>>> > Thanks!
>>> >
>>> > Jason
>>> >
>>> > On Tue, Mar 24, 2009 at 12:25 PM, Michael McCandless <
>>> > lucene@mikemccandless.com> wrote:
>>> >
>>> >> Hmmmm.
>>> >>
>>> >> Jason is this easily/compactly repeated?  EG, try to index the N docs
>>> >> before that one.
>>> >>
>>> >> If you remove the SinglePayloadTokenStream field, does the exception
>>> >> still happen?
>>> >>
>>> >> Mike
>>> >>
>>> >> Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
>>> >> > While indexing using
>>> >> > contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.
>>>  The
>>> >> > asserion error is from
>>> TermsHashPerField.comparePostings(RawPostingList
>>> >> p1,
>>> >> > RawPostingList p2).  A Payload is added to the document representing
>>> a
>>> >> UID.
>>> >> > Only 1-2 out of 1 million documents indexed generates this error.
>>> >> >
>>> >> > java.lang.AssertionError
>>> >> > problem adding
>>> >> >
>>> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
>>> >> > Washington.JPG|right|250px|thumb|The Croatian embassy]] The
>>> '''Croatian
>>> >> > Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
>>> >> [[Washington,
>>> >> > D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts
>>> Avenue
>>> >> > (Washington, DC)|Massachusetts Avenue]], [[Washington DC
>>> >> > (northwest)|Northwest]] near [[Dupont Circle]].  Previously the
>>> building
>>> >> had
>>> >> > been home to the [[Austrian Embassy in Washington|Austrian
>>> embassy]], but
>>> >> > they left for larger quarters and sold the structure to Croatia
in
>>> 1993.
>>> >> > The purchase and renovation of the building was largely paid for
by
>>> the
>>> >> > [[Croatian-American]] community.  In front of the embassy is a
large
>>> >> > sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
>>> >> > ==External link== *[http://www.croatiaemb.org/ Official site]
>>> >> > [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
>>> relations
>>> >> of
>>> >> > Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy
of
>>> >> Croatia
>>> >> > in Washington>
>>> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
>>> >> > 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
>>> >> >
>>> >>
>>> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
>>> >> >
>>> >> > indexed<id:667162>> ex: java.lang.AssertionError
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
>>> >> >    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
>>> >> >    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
>>> >> >    at
>>> >> >
>>> >>
>>> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
>>> >> >    at
>>> >> >
>>> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
>>> >> >    at
>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
>>> >> >    at
>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
>>> >> >    at
>>> >> >
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
>>> >> >    at
>>> >> >
>>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message