lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From houyang <hui.ouy...@oracle.com>
Subject RE: deleting documents from index
Date Thu, 01 Sep 2005 16:41:50 GMT
Thank you, Xiaozheng.
Actually the application could be more than 2 threads. And each thread could add/modify/delete
documents anytime (the deleting documents could be added earlier by another thread), so each
thread can not work on its own index file(thinking about any indexed document could be modified
any time and you have to delete previous version and add the new version). That is why I move
the actual deleting of the document based on the internal doc IDs to the end when all the
threads finish.

-----Original Message-----
From: Xiaozheng Ma [mailto:Xiaozheng.Ma@redwood.com] 
Sent: Thursday, September 01, 2005 7:18 AM
To: java-dev@lucene.apache.org
Cc: HUI.OUYANG@ORACLE.COM
Subject: RE: deleting documents from index

Indexing on one indexing file in a multithreaded env needs to be
serialized --you need to synchronize the call to
indexwriter.addDocument(). Otherwise Lucene will throw exceptions. After
all, Lucene uses file-based locking to ensure that only one thread can
modify the same index at the same time.  

In your situation, I believe, if you have multiple threads working on
same indexing file to index new docs, you still have same problem. But I
guess you probably only have one thread doing the indexing and another
one deletes the index by querying ids. 

One solution to multiple threaded indexing on the same index file is to
split the indexing process into independent pieces(and of course each
uses different index file): each thread works on indexing different docs
then at some point merges the segments into one index file if you will.
In the mean time, the deletion can delete the docs on the prior merged
file when the mention merging is not happening (it is not locked).

The merger code is like this:

        Directory[] inds = new Directory[fileList.length]; //each file
dir contains the complete and independent index segment
        for(int i=0; i<fileList.length;i++) { 
            String path = indexPath+"/"+fileList[i];
            inds[i] = FSDirectory.getDirectory(path, false); 
        } 
        indexPath = indexPath+"/merge";  //mergy to $(indexPath)/mergy
dir
        if(!(new File(indexPath).exists())){
            boolean success = (new File(indexPath)).mkdirs();
            if (!success) {
                System.out.println("cannot make dir: "+indexPath);
                System.exit(-1);
            }
        }
        
        IndexWriter writer = new IndexWriter(indexPath, new
StandardAnalyzer(), true);
        
        writer.addIndexes(inds); //merge indexes

        for(int i=0; i<fileList.length;i++) { 
            inds[i].close(); 
        } 
        writer.optimize(); 
        writer.close();    

Hope this helps!

Xiaozheng


-----Original Message-----
From: HUI.OUYANG@ORACLE.COM [mailto:HUI.OUYANG@ORACLE.COM] 
Sent: Thursday, September 01, 2005 1:28 AM
To: java-dev@lucene.apache.org
Subject: deleting documents from index

Hi,

In order to delete the documents in the index more efficiently during
the incremental indexing process, I implement the batch deleting process
on the application level. First  I  get the internal document ids based
on the query, then only delete these documents based on the internal ids
when the indexwriter is closed or the index is optimized since the
internal document ids change only whent the index optimized. Could this
be an issue ?
The reason for doing that is that deleting documents from the index in
one thread fails sometimes when another thread is adding new documents
in the same index.

Regards,
hui


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message