lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Updating tag-indexes
Date Tue, 19 Aug 2008 12:23:21 GMT

Yes, docIDs are currently sequentially assigned, starting with 0.

BUT: on hitting an exception (say in your analyzer) it will usually  
use up a docID (and then immediately mark it as deleted).

Also, this behavior isn't "promised" in the API, ie it could in theory  
(though I think it unlikely) change in a future release of Lucene.

And remember when a merge completes (or, optimize), any deleted docs  
will "collapse down" all docIDs after them.

Mike

Ivan Vasilev wrote:

> Hi Lucene Guys,
>
> I have a question that is simple but is important for me. I did not  
> found the answer in the javadoc so I am asking here.
> When adding Document-s by the method IndexWriter.addDocument(doc)  
> does the documents obtain Lucene IDs in the order that they are  
> added to the IndexWriter? I mean will first added doc be with Lucene  
> ID 0, second added with Lucene ID 1, etc?
>
> Bellow I describe why I am asking this.
> We plan to split our index to two separate indexes that will be read  
> by ParallelReader class. This is so because the one of them will  
> contain field(s) that will be indexed and stored and it will be  
> frequently changed. So to have always correct data returned from the  
> ParallelReader when changing documents in the small index the Lucene  
> IDs of these docs have to remain the same.
> To do this Karl Wettin suggests a solution described in *LUCENE-879 <https://issues.apache.org/jira/browse/LUCENE-879

> >*. I do not like this solution because it is connected to changing  
> Lucene source code, and after each refactoring potentially I will  
> have problems. The solution is related to optimizing index so it  
> will not be reasonably faster than the one that I prefer. And it is:
> 1. Read the whole index and reconstruct the documents including  
> index data by using TermDocs and TermEnum classes;
> 2. Change the needed documents;
> 3. Index documents in new index that will replace the initial one.
> I can even simplify this algorithm (and the speed) if all the fields  
> will be always stored - I can read just the stored data and based on  
> this to reconstruct the content of the docs and re index them in new.
>
> But anyway everything in the my approaches will depend on this - are  
> LuceneIDs in the index ordered in the same way as docs are added to  
> the IndexWriter.
>
> Thanks in Advance,
> Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message