lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Lucene 3.4 : shift bug in possibly invalid use of NumericTokenStream
Date Mon, 19 Dec 2011 08:58:00 GMT
Hi,
 
> I was hitting a similar exception (for me it was of type 'long'). But I
thought it
> was because I had a programming mistake. termAtt is reused.
> Couldn't it be that when two threads accessing the incrementToken method
at
> the same time that problems occur?

If it is not a problem in the user code invoking the IndexWriter, it cannot
happen, as IndexWriter only accesses one document per theread and can only
call incrementToken from one thread. But if, e.g. the user's indexing code
reuses Documents and Fields (as suggested for performance reasons), it may
happen that the *same* NumericField instance (or other document/field type)
is added to IndexWriter from different threads. In this case, it could
easily happen that shift gets out of bounds. But if this is the case, you
index is also crap, as all numeric values (and other fields) would be wrong.

> This exception disappeared when I fixed some threading issues in my app
... (it
> was even reproducable so I can post something if someone is
> interested)

I assume it was a bug like noted before?

> > This is difficult to repro. I'm not using any JVM flags. It does seem
> > that the following code could never call NumericUtils.intToPrefixCoded
> > with a shift > 31 (or shift < 0) so I tend to agree this must be a JVM
> > bug. Looking through all logs I have for December, I only found one
> > instance of this issue. It seems it has nothing to do with
> > concurrency, then it must have to do with the value set in the
> > NumericField, so the bug must be triggered by a particular timestamp.

The timestamp cannot trigger it, the check is only done on the
precisionStep/shift/valSize fields, so the actual value is unaffected. If it
is not a concurrency bug by reusing documents/fields from different threads,
there can only be a sign flip in the JVM.

Uwe

...

> > from:
> > http://javasourcecode.org/html/open-source/lucene/lucene-3.3.0/org/apa
> > che/lucene/analysis/NumericTokenStream.java.html
> >
> >
> >   public boolean incrementToken() {
> >     if (valSize == 0)
> >       throw new IllegalStateException
> > <http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/Ill
> > egalStateException.java.html>("call
> > set???Value() before usage");
> >     if (shift >= valSize)
> >       return false;
> >
> >     clearAttributes();
> >     final char[] buffer;
> >     switch (valSize) {
> >       case 64:
> >         buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_LONG);
> >         termAtt.setLength(NumericUtils.longToPrefixCoded(value, shift,
buffer));
> >         break;
> >
> >       case 32:
> >         buffer = termAtt.resizeBuffer(NumericUtils.BUF_SIZE_INT);
> >         termAtt.setLength(NumericUtils.intToPrefixCoded((int) value,
> > shift, buffer));
> >         break;
> >
> >       default:
> >         // should not happen        throw new IllegalArgumentException
> > <http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java/lang/Ill
> > egalArgumentException.java.html>("valSize
> > must be 32 or 64");
> >     }
> >
> >     typeAtt.setType((shift == 0) ? TOKEN_TYPE_FULL_PREC :
> > TOKEN_TYPE_LOWER_PREC);
> >     posIncrAtt.setPositionIncrement((shift == 0) ? 1 : 0);
> >     shift += precisionStep;
> >     return true;
> >   }
> >
> >
> > On Sun, Dec 18, 2011 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> >> Hi,****
> >>
> >> ** **
> >>
> >> Can you try 1.6.0_29 or disable hotspot by using "-Xint" JVM startup
> >> flag (just to test, I know, it's slow then)? Are you **not** using
> >> "-XX:+AggressiveOpts" as JVM parameter?****
> >>
> >> The JVM bug which may lead to this is a sign-flip bug:
> >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921 (see also
> >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-Marc
> >> h/004942.html
> >> )****
> >>
> >> ** **
> >>
> >> Otherwise, is all fine, if you remove the numeric field? The code you
> >> are using can never cause such behavior, this is extensively
> >> tested.****
> >>
> >> ** **
> >>
> >> Uwe****
> >>
> >> ** **
> >>
> >> -----****
> >>
> >> Uwe Schindler****
> >>
> >> H.-H.-Meier-Allee 63, D-28213 Bremen****
> >>
> >> http://www.thetaphi.de****
> >>
> >> eMail: uwe@thetaphi.de****
> >>
> >> ** **
> >>
> >> *From:* Thushara Wijeratna [mailto:thushw@gmail.com]
> >> *Sent:* Sunday, December 18, 2011 11:17 PM
> >>
> >> *To:* java-user@lucene.apache.org; uwe@thetaphi.de
> >> *Subject:* Re: Lucene 3.4 : shift bug in possibly invalid use of
> >> NumericTokenStream****
> >>
> >> ** **
> >>
> >> Yes, I use this field to set a timestamp (an int). And I'm not using
> >> the special constructor, so I must be using the default precision
> >> step.****
> >>
> >> Java version : 1.6.0_24****
> >>
> >> ** **
> >>
> >> mpire@seafcmr16:~$ java -version****
> >>
> >> java version "1.6.0_24"****
> >>
> >> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)****
> >>
> >> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)****
> >>
> >> ** **
> >>
> >> Also : I have only seen this when multiple threads within the app is
> >> writing to a single Lucene index. But it is rare.****
> >>
> >> ** **
> >>
> >> I'm attaching the indexing code.****
> >>
> >> ** **
> >>
> >> Could you also point me to the JVM bug you suspect to be the
> >> cause?****
> >>
> >> ** **
> >>
> >> thx,****
> >>
> >> thushara****
> >>
> >> ** **
> >>
> >> On Fri, Dec 16, 2011 at 4:07 PM, Uwe Schindler <uwe@thetaphi.de>
> >> wrote:***
> >> *
> >>
> >> Hi,
> >>
> >> Thanks, this *may* cause the exception, but it is impossible that the
> >> exception stack trace you are posting occurs in Lucene's code with a
> >> default precision step on a numeric field, as you use here. I assume
> >> it's a 32bit integer (NumericField.setIntValue or setFloatValue)?
> >>
> >> Please provide us your full Java version (java -version) and ideally
> >> the full source code you use during indexing. The only chance you can
> >> get this Exception is by a JVM bug.****
> >>
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>
> >>> -----Original Message-----
> >>> From: Thushara Wijeratna [mailto:thushw@gmail.com]****
> >>> Sent: Saturday, December 17, 2011 1:01 AM
> >>> To: java-user@lucene.apache.org; uwe@thetaphi.de
> >>> Subject: Re: Lucene 3.4 : shift bug in possibly invalid use of
> >>> NumericTokenStream
> >>>
> >>> Yes, there is one.
> >>>
> >>> This is how the field is being created:
> >>>
> >>> new NumericField("timestamp", Field.Store.NO, true);
> >>>
> >>> Thus, the field is not stored, but indexed.
> >>>
> >>> thx,
> >>> thushara
> >>>
> >>>
> >>> On Fri, Dec 16, 2011 at 3:28 PM, Uwe Schindler <uwe@thetaphi.de>
wrote:
> >>>
> >>>> Do you have NumericFields? If yes, how are they configured?
> >>>>
> >>>> -----
> >>>> Uwe Schindler
> >>>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >>>> eMail: uwe@thetaphi.de
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Thushara Wijeratna [mailto:thushw@gmail.com]
> >>>>> Sent: Saturday, December 17, 2011 12:25 AM
> >>>>> To: java-user@lucene.apache.org
> >>>>> Subject: Lucene 3.4 : shift bug in possibly invalid use of
> >>>> NumericTokenStream
> >>>>> I got this exception while indexing with Lucene 3.4:
> >>>>>
> >>>>> Exception in thread "Thread-0" java.lang.IllegalArgumentException:
> >>>> Illegal
> >>>> shift
> >>>>> value, must be 0..31
> >>>>>
> >>>>> at
> >>>>>
> >>
>
org.apache.lucene.util.NumericUtils.intToPrefixCoded(NumericUtils.java:157)
> >>>>> at
> >>>>>
> >>>
> org.apache.lucene.analysis.NumericTokenStream.incrementToken(NumericTok
> >>>>> enStream.java:217)
> >>>>>
> >>>>> at
> >>>>>
> >>>>
> >>
>
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerFiel
> >>>> d
> >>>>> .java:185)
> >>>>>
> >>>>> at
> >>>>>
> >>>
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFie
> >>>>> ldProcessorPerThread.java:278)
> >>>>>
> >>>>> at
> >>>>>
> >>>
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter
> >>>>> .java:766)
> >>>>>
> >>>>> at
> >>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2067)
> >>>>> at
> >>> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2041)
> >>>>> at com.adxpose.affinity.IndexerHelper.index(IndexerHelper.java:797)
> >>>>>
> >>>>> at
> >> com.adxpose.affinity.IndexerHelper$Clerk.run(IndexerHelper.java:433)
> >>>>> at java.lang.Thread.run(Thread.java:662)
> >>>>>
> >>>>>
> >>>>> It is not clear to my why the NumericTokenStream is being called
> >> here,
> >> as
> >>>> my
> >>>>> analyzer do not use that. Any clues much appreciated.
> >>>>>
> >>>>>
> >>>>> thx,
> >>>>>
> >>>>> thushara
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org****
> >>
> >> ** **
> >>
> 
> 
> --
> http://jetsli.de news reader for geeks
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message