lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1
Date Sat, 10 May 2008 09:12:37 GMT

Just to bring closure on this issue... after numerous iterations with
Marcelo, adding diagnostics to Lucene to isolate the cause of this
exception, it turns out that it's a bug in Oracle 11g's JRE.

Marcelo worked it down to a single document, which when added to a new
index, would hit the exception.  Except, the first time he created
this single doc index it would run fine.  Only the 2nd time he created
it would it hit the exception.

A standalone test (static void main(..)) indexing that one doc runs
fine as well, on multiple OS's / JRE versions.

The bug seems to be related to the JIT compiler; specifically it seems
to cause quickSort to only partially run such that the array of terms
that are about to be flushed are not properly sorted and is left with
duplicate entries.  Very weird.

Marcelo is trying to find a workaround, eg maybe tweaking the JIT
settings, to prevent this from happening.  He's also trying to modify
the standalone test to get it to fail inside Oracle so he can submit
an issue to Oracle.

This isn't the first JRE bug that's affecting Lucene.  Here's another:

   https://issues.apache.org/jira/browse/LUCENE-1282

I don't like JRE bugs!

Mike

Marcelo Ochoa wrote:
> Hi Michael:
>   First thanks a lot for your time.
>   See comments below.
>>  Is there any way to capture & serialize the actual documents being
>>  added (this way I can "replay" those docs to reproduce it)?
>   Documents are a column VARCHAR2 from all_source Oracle's System
> view, in fact is a table as:
> create table test_source_big as (select * from all_source);
>>
>>  Are you using threads?  Is autoCommit true or false?
>   Oracle JVM uses by default a single Thread model, except that Lucene
> is starting a parallel Thread. InfoStream information shows only one
> Thread.
>   AutoCommit is false.
>   I am creating LuceneWritter with this code:
>         IndexWriter writer = null;
>         Parameters parameters = dir.getParameters();
>         int mergeFactor =
>             Integer.parseInt(parameters.getParameter("MergeFactor",
>                                                      "" +
> LogMergePolicy.DEFAULT_MERGE_FACTOR));
>         int maxBufferedDocs =
>             Integer.parseInt(parameters.getParameter 
> ("MaxBufferedDocs",
>                                                      "" +
> IndexWriter.DEFAULT_MAX_BUFFERED_DOCS));
>         int maxMergeDocs =
>             Integer.parseInt(parameters.getParameter("MaxMergeDocs",
>                                                      "" +
> LogDocMergePolicy.DEFAULT_MAX_MERGE_DOCS));
>         int maxBufferedDeleteTerms =
>             Integer.parseInt(parameters.getParameter 
> ("MaxBufferedDeleteTerms",
>                                                      "" +
>
> IndexWriter.DEFAULT_MAX_BUFFERED_DELETE_TERMS));
>         Analyzer analyzer = getAnalyzer(parameters);
>         boolean useCompountFileName =
>             "true".equalsIgnoreCase(parameters.getParameter 
> ("UseCompoundFile",
>                                                             "false"));
>         boolean autoTuneMemory =
>             "true".equalsIgnoreCase(parameters.getParameter 
> ("AutoTuneMemory",
>                                                             "true"));
>         writer =
>                 new IndexWriter(dir, autoCommitEnable, analyzer,  
> createEnable);
>         if (autoTuneMemory) {
>             long memLimit =
> ((OracleRuntime.getJavaPoolSize()/100)*50)/(1024*1024);
>             logger.info(".getIndexWriterForDir - Memory limit for
> indexing (Mb): "+memLimit);
>             writer.setRAMBufferSizeMB(memLimit);
>         } else
>             writer.setMaxBufferedDocs(maxBufferedDocs);
>         writer.setMaxMergeDocs(maxMergeDocs);
>         writer.setMaxBufferedDeleteTerms(maxBufferedDeleteTerms);
>         writer.setMergeFactor(mergeFactor);
>         writer.setUseCompoundFile(useCompountFileName);
>         if (logger.isLoggable(Level.FINE))
>             writer.setInfoStream(System.out);
>    The example pass these relevant parameters:
>     
> AutoTuneMemory:true;LogLevel:FINE;Analyzer:org.apache.lucene.analysis. 
> StopAnalyzer;MergeFactor:500
>    So, because AutoTuneMemory is true, instead of setting
> MaxBufferedDocs I am setting RAMBufferSizeMB(53) which is calculated
> using Oracle SGA free memory.
>>
>>  Are you using payloads?
>   No.
>>
>>  Were there any previous exceptions in this IndexWriter before  
>> flushing
>>  this segment?  Could you post the full infoStream output?
>   There is no provious exception. Attached a .trc file generated by
> Oracle 11g, it have infoStream information plus logging informartion
> from Oracle-Lucene data cartridge.
>>
> <snip>
>>  Could you apply the patch below & re-run?  It will likely produce
>>  insane amounts of output, but we only need the last section to see
>>  which term is hitting the bug.  If that term consistently hits  
>> the bug
>>  then we can focus on how/when it gets indexed...
>   I'll patch my lucene-2.3.1 source and send again the .trc file.
>   Also, I am comparing FSDirectory implementation (2.3.1) with my
> OJVMDirectory implementation to see changes on how the API of
> BufferedIndex[Input|Output].java is used, may be here is the problem.
>   For example latest implementation wait an IOException when open an
> IndexInput and a file doesn't exists, my code throw a RuntimeException
> wich works with Lucene 2.2.x but doesn't work with 2.3.1, this was the
> first change to get Lucene-Oracle integration working.
>   Best regards. Marcelo.
> -- 
> Marcelo F. Ochoa
> http://marceloochoa.blogspot.com/
> http://marcelo.ochoa.googlepages.com/home
> ______________
> Do you Know DBPrism? Look @ DB Prism's Web Site
> http://www.dbprism.com.ar/index.html
> More info?
> Chapter 17 of the book "Programming the Oracle Database using Java &
> Web Services"
> http://www.amazon.com/gp/product/1555583296/
> Chapter 21 of the book "Professional XML Databases" - Wrox Press
> http://www.amazon.com/gp/product/1861003587/
> Chapter 8 of the book "Oracle & Open Source" - O'Reilly
> http://www.oreilly.com/catalog/oracleopen/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message