lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Assertion Error in TermsHashPerField.comparePostings - Lucene 2.4
Date Tue, 24 Mar 2009 21:09:06 GMT
Using StandardAnalyzer.  It's probably the payload field?

This is the code that creates the payload field:

private static class SinglePayloadTokenStream extends TokenStream {
           private Token token = new Token(UID_TERM.text(), 0, 0);
           private byte[] buffer = new byte[4];
           private boolean returnToken = false;

           void setUID(int uid) {
             buffer[0] = (byte) (uid);
             buffer[1] = (byte) (uid >> 8);
             buffer[2] = (byte) (uid >> 16);
             buffer[3] = (byte) (uid >> 24);
             token.setPayload(new Payload(buffer));
             returnToken = true;
           }

           public Token next() throws IOException {
             if (returnToken) {
               returnToken = false;
               return token;
             } else {
               return null;
             }
           }
         }

    public static void fillDocumentID(Document doc,int id)
    {
          SinglePayloadTokenStream singlePayloadTokenStream = new
SinglePayloadTokenStream();
          singlePayloadTokenStream.setUID(id);
           Field f=doc.getField(UID_TERM.field());
           if (f==null)
           {
               f=new Field(UID_TERM.field(), singlePayloadTokenStream);
               doc.add(f);
           }
           else{
               f.setValue(singlePayloadTokenStream);
           }
           f=null;
           f=doc.getField(Indexable.DOCUMENT_ID_FIELD);
           if (f==null)
           {
               f=new
Field(Indexable.DOCUMENT_ID_FIELD,String.valueOf(id),Store.NO,Index.NOT_ANALYZED);
               doc.add(f);
           }
           else
           {
               f.setValue(String.valueOf(id));
           }
      }

On Tue, Mar 24, 2009 at 12:36 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I was just able to index all of wikipedia, using StandardAnalyzer,
> with assertions enabled, without hitting that exception.  Which
> analyzer are you using (besides your payload field)?
>
> Mike
>
> Michael McCandless <lucene@mikemccandless.com> wrote:
> > Hmmmm.
> >
> > Jason is this easily/compactly repeated?  EG, try to index the N docs
> > before that one.
> >
> > If you remove the SinglePayloadTokenStream field, does the exception
> > still happen?
> >
> > Mike
> >
> > Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
> >> While indexing using
> >> contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker.  The
> >> asserion error is from TermsHashPerField.comparePostings(RawPostingList
> p1,
> >> RawPostingList p2).  A Payload is added to the document representing a
> UID.
> >> Only 1-2 out of 1 million documents indexed generates this error.
> >>
> >> java.lang.AssertionError
> >> problem adding
> >> doc:Document<stored/uncompressed,indexed,tokenized<body:[[Image:Croatia,
> >> Washington.JPG|right|250px|thumb|The Croatian embassy]] The '''Croatian
> >> Embassy in Washington''' is the [[embassy]] of [[Croatia]] in
> [[Washington,
> >> D.C.]]  It is located on [[Embassy Row]] at 2343 [[Massachusetts Avenue
> >> (Washington, DC)|Massachusetts Avenue]], [[Washington DC
> >> (northwest)|Northwest]] near [[Dupont Circle]].  Previously the building
> had
> >> been home to the [[Austrian Embassy in Washington|Austrian embassy]],
> but
> >> they left for larger quarters and sold the structure to Croatia in 1993.
> >> The purchase and renovation of the building was largely paid for by the
> >> [[Croatian-American]] community.  In front of the embassy is a large
> >> sculpture of [[St. Jerome]] by Croatian sculptor [[Ivan Me?trovi?]].
> >> ==External link== *[http://www.croatiaemb.org/ Official site]
> >> [[Category:Embassies in Washington|Croatia]] [[Category:Foreign
> relations of
> >> Croatia]]> stored/uncompressed,indexed,tokenized<doctitle:Embassy of
> Croatia
> >> in Washington> stored/uncompressed,indexed,tokenized<docdate:29-JUN-2006
> >> 07:27:44.000> stored/uncompressed,indexed,omitNorms<docid:1703107>
> >>
> indexed,tokenized<_ID:proj.zoie.api.ZoieIndexReader$SinglePayloadTokenStream@e7b3cf
> >
> >> indexed<id:667162>> ex: java.lang.AssertionError
> >>    at
> >>
> org.apache.lucene.index.TermsHashPerField.comparePostings(TermsHashPerField.java:228)
> >>    at
> >>
> org.apache.lucene.index.TermsHashPerField.quickSort(TermsHashPerField.java:144)
> >>    at
> >>
> org.apache.lucene.index.TermsHashPerField.sortPostings(TermsHashPerField.java:136)
> >>    at
> >>
> org.apache.lucene.index.FreqProxFieldMergeState.<init>(FreqProxFieldMergeState.java:51)
> >>    at
> >>
> org.apache.lucene.index.FreqProxTermsWriter.appendPostings(FreqProxTermsWriter.java:202)
> >>    at
> >>
> org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:132)
> >>    at org.apache.lucene.index.TermsHash.flush(TermsHash.java:145)
> >>    at org.apache.lucene.index.DocInverter.flush(DocInverter.java:74)
> >>    at
> >>
> org.apache.lucene.index.DocFieldConsumers.flush(DocFieldConsumers.java:75)
> >>    at
> >>
> org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
> >>    at
> >> org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:574)
> >>    at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3533)
> >>    at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3442)
> >>    at
> >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1922)
> >>    at
> >> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1880)
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message