lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jayakumar.V" <jayakuma...@uaeexchange.com>
Subject RE: lucene indexing performance
Date Mon, 16 May 2005 13:17:42 GMT
Hi Chuck,
Sorry for the delayed response & thanks for the input. As suggested, I've
batched my addition / deletion process & there is a reasonably better
performance gain. 

Jayakumar.V

-----Original Message-----
From: Chuck Williams [mailto:chuck@allthingslocal.com] 
Sent: Sunday, April 24, 2005 1:58 AM
To: java-user@lucene.apache.org
Subject: Re: lucene indexing performance

One immediate optimization would be to only close the writer and open 
the reader if the document is present.  You can have a reader open and 
do searches while indexing (and optimization) are underway.  It's just 
the delete operation that requires you to close the writer (so you don't 
have two different objects trying the update the same index).

However, that is rendered moot by the much bigger optimization.  Open 
the reader once, do all your deletes, close the reader, and then do all 
your adds.  I.e., batch your updates as much as possible.

If you have to them one at a time, then the first optimization should 
help some.

Chuck

Jayakumar.V wrote:

>Hi,
>
>Maybe this query has been answered before. My first email to this user
group
>did not generate any response. I had forwarded it to the following email
ids
>:  
>
>java-user-info@lucene.apache.org
>
>java-user@lucene.apache.org
>
> 
>
>This is my second email to this mail id. Hope I've reached the right place.
>
> 
>
>We are indexing documents on a scheduled basis. A document which was
indexed
>at time T1 will be available again for indexing at time T2 with certain
>additional fields. Now, I need to ensure that only the document received at
>time T2 is present in the index, for which I need to first identify if the
>record is present in the index & then delete it before indexing the same.
>I've taken the cue from a code snippet available in the TSS case study in
>the book Lucene In Action. 
>
> 
>
>The steps I've followed is as below :
>
>-          prepare the Document for indexing
>
>-          close the existing  IndexWriter instance
>
>-          get an IndexReader instance to the index
>
>-          check if the record going to be indexed is already available in
>the index 
>
>-          if YES, delete it & close the IndexReader instance
>
>-          open the IndexWriter instance again
>
>-          add the Document to the index
>
> 
>
>Now, this is an iterative process for each record being indexed. Is it the
>right way to go about doing this? It took nearly 3 hours to index 250,000
>records.
>
> 
>
>I'm attaching the code snippet used in my app. for deleting & adding the
>record :
>
> 
>
>    private void addIndex(Document doc, Map dataMap) {
>
>        IndexReader indexReader = null;
>
>        
>
>        // check if the doc. is already indexed.
>
>        // if YES, first remove it b4 adding the document
>
>        try {
>
>            // first, close the undelying IndexWriter instance
>
>            // v can't have 2 index modifying instances open at the same
>time
>
>            closeWriterIndex();
>
>            
>
>            // get an IndexReader instance
>
>            indexReader = IndexReader.open(fsDir);
>
>            // get a Term obj. for deletion
>
>            Term term = new Term("xpin",(String)dataMap.get("xpin"));
>
>            // now, remove the already added doc.
>
>            indexReader.delete(term);
>
>      } catch (IOException e) {
>
>            e.printStackTrace();
>
>      } finally {
>
>          try {
>
>              // close the reader instance after deleting the doc.
>
>              indexReader.close();
>
>          } catch (IOException e) {
>
>            e.printStackTrace();
>
>          }
>
>      }
>
>         
>
>      try {
>
>            // now, reopen the index writer object
>
>            openWriterIndex();
>
> 
>
>            // index the document
>
>            fsWriter.addDocument(doc);
>
>        } catch (IOException e) {
>
>            e.printStackTrace();
>
>        }
>
>    }
>
> 
>
>    private void closeWriterIndex() {
>
>      try {
>
>            fsWriter.close();
>
>      } catch (IOException e) {
>
>            e.printStackTrace();
>
>      }
>
>    }
>
> 
>
>    private void openWriterIndex() {
>
>      try {
>
>            fsWriter = new IndexWriter(fsDir, new StandardAnalyzer(),
>false);
>
>            fsWriter.mergeFactor = 100;
>
>      } catch (IOException e) {
>
>            e.printStackTrace();
>
>      }  
>
>    }
>
> 
>
>I'm at the final stages of deploying this module. Any suggestions / ideas
>will be helpful in completing it fast. 
>
> 
>
> 
>
>TIA 
>
>Jayakumar.V
>
> 
>
>
>  
>


-- 
*Chuck Williams*
All Things Local
Founder and CEO
V: (415)464-1889
C: (415)846-9018
chuck@AllThingsLocal.com <mailto:chuck@AllThingsLocal.com>
AIM: hawimanawiz

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message