lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jayakumar.V" <jayakuma...@uaeexchange.com>
Subject lucene indexing performance
Date Sat, 23 Apr 2005 21:44:03 GMT
Hi,

Maybe this query has been answered before. My first email to this user group
did not generate any response. I had forwarded it to the following email ids
:  

java-user-info@lucene.apache.org

java-user@lucene.apache.org

 

This is my second email to this mail id. Hope I've reached the right place.

 

We are indexing documents on a scheduled basis. A document which was indexed
at time T1 will be available again for indexing at time T2 with certain
additional fields. Now, I need to ensure that only the document received at
time T2 is present in the index, for which I need to first identify if the
record is present in the index & then delete it before indexing the same.
I've taken the cue from a code snippet available in the TSS case study in
the book Lucene In Action. 

 

The steps I've followed is as below :

-          prepare the Document for indexing

-          close the existing  IndexWriter instance

-          get an IndexReader instance to the index

-          check if the record going to be indexed is already available in
the index 

-          if YES, delete it & close the IndexReader instance

-          open the IndexWriter instance again

-          add the Document to the index

 

Now, this is an iterative process for each record being indexed. Is it the
right way to go about doing this? It took nearly 3 hours to index 250,000
records.

 

I'm attaching the code snippet used in my app. for deleting & adding the
record :

 

    private void addIndex(Document doc, Map dataMap) {

        IndexReader indexReader = null;

        

        // check if the doc. is already indexed.

        // if YES, first remove it b4 adding the document

        try {

            // first, close the undelying IndexWriter instance

            // v can't have 2 index modifying instances open at the same
time

            closeWriterIndex();

            

            // get an IndexReader instance

            indexReader = IndexReader.open(fsDir);

            // get a Term obj. for deletion

            Term term = new Term("xpin",(String)dataMap.get("xpin"));

            // now, remove the already added doc.

            indexReader.delete(term);

      } catch (IOException e) {

            e.printStackTrace();

      } finally {

          try {

              // close the reader instance after deleting the doc.

              indexReader.close();

          } catch (IOException e) {

            e.printStackTrace();

          }

      }

         

      try {

            // now, reopen the index writer object

            openWriterIndex();

 

            // index the document

            fsWriter.addDocument(doc);

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

 

    private void closeWriterIndex() {

      try {

            fsWriter.close();

      } catch (IOException e) {

            e.printStackTrace();

      }

    }

 

    private void openWriterIndex() {

      try {

            fsWriter = new IndexWriter(fsDir, new StandardAnalyzer(),
false);

            fsWriter.mergeFactor = 100;

      } catch (IOException e) {

            e.printStackTrace();

      }  

    }

 

I'm at the final stages of deploying this module. Any suggestions / ideas
will be helpful in completing it fast. 

 

 

TIA 

Jayakumar.V

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message