lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Ochoa" <marcelo.oc...@gmail.com>
Subject Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1
Date Wed, 07 May 2008 12:07:01 GMT
Hi Michael:
  First thanks a lot for your time.
  See comments below.
>  Is there any way to capture & serialize the actual documents being
>  added (this way I can "replay" those docs to reproduce it)?
  Documents are a column VARCHAR2 from all_source Oracle's System
view, in fact is a table as:
create table test_source_big as (select * from all_source);
>
>  Are you using threads?  Is autoCommit true or false?
  Oracle JVM uses by default a single Thread model, except that Lucene
is starting a parallel Thread. InfoStream information shows only one
Thread.
  AutoCommit is false.
  I am creating LuceneWritter with this code:
        IndexWriter writer = null;
        Parameters parameters = dir.getParameters();
        int mergeFactor =
            Integer.parseInt(parameters.getParameter("MergeFactor",
                                                     "" +
LogMergePolicy.DEFAULT_MERGE_FACTOR));
        int maxBufferedDocs =
            Integer.parseInt(parameters.getParameter("MaxBufferedDocs",
                                                     "" +
IndexWriter.DEFAULT_MAX_BUFFERED_DOCS));
        int maxMergeDocs =
            Integer.parseInt(parameters.getParameter("MaxMergeDocs",
                                                     "" +
LogDocMergePolicy.DEFAULT_MAX_MERGE_DOCS));
        int maxBufferedDeleteTerms =
            Integer.parseInt(parameters.getParameter("MaxBufferedDeleteTerms",
                                                     "" +

IndexWriter.DEFAULT_MAX_BUFFERED_DELETE_TERMS));
        Analyzer analyzer = getAnalyzer(parameters);
        boolean useCompountFileName =
            "true".equalsIgnoreCase(parameters.getParameter("UseCompoundFile",
                                                            "false"));
        boolean autoTuneMemory =
            "true".equalsIgnoreCase(parameters.getParameter("AutoTuneMemory",
                                                            "true"));
        writer =
                new IndexWriter(dir, autoCommitEnable, analyzer, createEnable);
        if (autoTuneMemory) {
            long memLimit =
((OracleRuntime.getJavaPoolSize()/100)*50)/(1024*1024);
            logger.info(".getIndexWriterForDir - Memory limit for
indexing (Mb): "+memLimit);
            writer.setRAMBufferSizeMB(memLimit);
        } else
            writer.setMaxBufferedDocs(maxBufferedDocs);
        writer.setMaxMergeDocs(maxMergeDocs);
        writer.setMaxBufferedDeleteTerms(maxBufferedDeleteTerms);
        writer.setMergeFactor(mergeFactor);
        writer.setUseCompoundFile(useCompountFileName);
        if (logger.isLoggable(Level.FINE))
            writer.setInfoStream(System.out);
   The example pass these relevant parameters:
   AutoTuneMemory:true;LogLevel:FINE;Analyzer:org.apache.lucene.analysis.StopAnalyzer;MergeFactor:500
   So, because AutoTuneMemory is true, instead of setting
MaxBufferedDocs I am setting RAMBufferSizeMB(53) which is calculated
using Oracle SGA free memory.
>
>  Are you using payloads?
  No.
>
>  Were there any previous exceptions in this IndexWriter before flushing
>  this segment?  Could you post the full infoStream output?
  There is no provious exception. Attached a .trc file generated by
Oracle 11g, it have infoStream information plus logging informartion
from Oracle-Lucene data cartridge.
>
<snip>
>  Could you apply the patch below & re-run?  It will likely produce
>  insane amounts of output, but we only need the last section to see
>  which term is hitting the bug.  If that term consistently hits the bug
>  then we can focus on how/when it gets indexed...
  I'll patch my lucene-2.3.1 source and send again the .trc file.
  Also, I am comparing FSDirectory implementation (2.3.1) with my
OJVMDirectory implementation to see changes on how the API of
BufferedIndex[Input|Output].java is used, may be here is the problem.
  For example latest implementation wait an IOException when open an
IndexInput and a file doesn't exists, my code throw a RuntimeException
wich works with Lucene 2.2.x but doesn't work with 2.3.1, this was the
first change to get Lucene-Oracle integration working.
  Best regards. Marcelo.
-- 
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/


Mime
View raw message