lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Updating tag-indexes
Date Tue, 19 Aug 2008 13:28:03 GMT
I'd add to Michael's mail the *strong* recommendation that you provide
your own unique doc IDs and use *those* instead. It'll save you a world
of grief. Whenever you need to add a new doc to an existing index, you
can get the maximum of *your* unique IDs and increment it yourself.

One thing to remember is that not all Lucene docs need to have the same
fields. So it's even possible to have a *very special* document that
contains
meta-data about your index, say the last used of your generated IDs and
keep that meta-data doc up to date. If you put fields in that doc that are
NOT
in any other doc, you don't have to worry about accidentally getting this
meta-data doc in your searches....

Best
Erick

On Tue, Aug 19, 2008 at 8:01 AM, Ivan Vasilev <ivasilev@sirma.bg> wrote:

> Hi Lucene Guys,
>
> I have a question that is simple but is important for me. I did not found
> the answer in the javadoc so I am asking here.
> When adding Document-s by the method IndexWriter.addDocument(doc) does the
> documents obtain Lucene IDs in the order that they are added to the
> IndexWriter? I mean will first added doc be with Lucene ID 0, second added
> with Lucene ID 1, etc?
>
> Bellow I describe why I am asking this.
> We plan to split our index to two separate indexes that will be read by
> ParallelReader class. This is so because the one of them will contain
> field(s) that will be indexed and stored and it will be frequently changed.
> So to have always correct data returned from the ParallelReader when
> changing documents in the small index the Lucene IDs of these docs have to
> remain the same.
> To do this Karl Wettin suggests a solution described in *LUCENE-879 <
> https://issues.apache.org/jira/browse/LUCENE-879>*. I do not like this
> solution because it is connected to changing Lucene source code, and after
> each refactoring potentially I will have problems. The solution is related
> to optimizing index so it will not be reasonably faster than the one that I
> prefer. And it is:
> 1. Read the whole index and reconstruct the documents including index data
> by using TermDocs and TermEnum classes;
> 2. Change the needed documents;
> 3. Index documents in new index that will replace the initial one.
> I can even simplify this algorithm (and the speed) if all the fields will
> be always stored - I can read just the stored data and based on this to
> reconstruct the content of the docs and re index them in new.
>
> But anyway everything in the my approaches will depend on this - are
> LuceneIDs in the index ordered in the same way as docs are added to the
> IndexWriter.
>
> Thanks in Advance,
> Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message