lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject IndexWriter.optimize and memory usage
Date Fri, 03 Dec 2004 01:07:19 GMT

I've been running into an interesting situation that I wanted to ask
about.

I've been doing some testing by building up indexes with code that looks
like this...

     IndexWriter writer = null;
     try {
         writer = new IndexWriter("index", new StandardAnalyzer(), true);
         writer.mergeFactor = MERGE_FACTOR;
         PooledExecutor queue = new PooledExecutor(NUM_UPDATE_THREADS);
         queue.waitWhenBlocked();

         for (int min=low; min < high; min += BATCH_SIZE) {
             int max = min + BATCH_SIZE;
             if (high < max) {
                 max = high;
             }
             queue.execute(new BatchIndexer(writer, min, max));
         }
         end = new Date();
         System.out.println("Build Time: " + (end.getTime() - start.getTime()) + "ms");
         start = end;
         writer.optimize();
     } finally {
         if (null != writer) {
             try { writer.close(); } catch (Exception ignore) {/*NOOP*/; }
         }
     }
     end = new Date();
     System.out.println("Optimize Time: " + (end.getTime() - start.getTime()) + "ms");


(where BatchIndexer is a class i have that gets a DB connection, and
slurps all records from my DB between min and max and builds some simple
Documents out of them and calls writer.addDocument(doc) on each)

This was working fine with small ranges, but then i tried building up a
nice big index for doing some performance testing.  i left it running
overnight and when i came back in the morning i discovered that after
successfully building up the whole index (~112K docs, ~1.5GB disk) it
crashed with an OutOfMemory exception while trying to optimize.

I then realized i was only running my JVM with a 256m upper limit on RAM,
and i figured that PooledExecutor was still in scope, and maybe it was
maintaining some state that was using up a lot of space, so i whiped up a
quick little app to solve my problem...

    public static void main(String[] args) throws Exception {
        IndexWriter writer = null;
        try {
            writer = new IndexWriter("index", new StandardAnalyzer(), false);
            writer.optimize();
        } finally {
            if (null != writer) {
                try { writer.close(); } catch (Exception ignore) { /*NOOP*/; }
            }
        }
    }

...but I was dissapointed to discover that even this couldn't run with
only 256m of ram.  I bumped it up to 512m and then it manged to complete
successfully (the final index was only 1.1GB of disk).


This raises a few questions in my mind:

1) Is there a rule of thumb for knowing how much memory it takes to
   optimize an index?

2) Is there a "Best Practice" to follow when building up a large index
   from scratch in order to reduce the amount of memory needed to optimize
   once the whole index is build?  (ie: would spining up a thread that
   called writer.optimize() every N minutes be a good idea?)

3) Given an unoptimized index that's allready been built (ie: in the case
   where my builder crashed and i wanted to try and optimize it without
   having to rebuild from scratch) is there anyway to get IndexWriter to
   use less RAM and more disk (trading spead for a smaller form factor --
   and aparently: greater stability so that the app doesn't crash)


I imagine that the answers to #1 and #2 are largely dependent on the
nature of the data in the index (ie: the frequency of terms) but i'm
wondering if there is a high level formula that could be used to say
"based on the nature of your data, you want to take this approach to
optimizing when you build"



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message