lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Carlos Pita" <carlosjosep...@gmail.com>
Subject Re: maxDoc and arrays
Date Thu, 24 May 2007 20:27:49 GMT
Yes Erick, that's fine. But the fact is that I'm not sure whether the next
added document will have an id equal to maxDocs. If this is guaranteed, then
I will update the maxDocs slot of my extra data structure upon document
addition and get rid of the hits.id(0) slot upon document deletion. Then,
when the index is optimized, I will recreate the entire structure from
scratch. Do you think I could rely on this?

Cheers,
Carlos

On 5/24/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> From the Javadoc for IndexReader.....
>
> Returns one greater than the largest possible document number. This may be
> used to, e.g., determine how big to allocate an array which will have an
> element for every document number in an index.
>
> Isn't that what you're wondering about?
>
> Erick
>
> On 5/24/07, Carlos Pita <carlosjosepita@gmail.com> wrote:
> >
> > That's no problem, I can regenerate my entire extra data structure upon
> > periodic index optimization. That way the array size will be about  the
> > size
> > of the index. What I find more difficult is to know the id of the last
> > added/removed document. I need it to update the in-mem structure upon
> more
> > fine-grained index changes.  Any ideas?
> >
> > TIA.
> > Cheers,
> > Carlos
> >
> > On 5/24/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > >
> > > Document IDs will be re-utilized, after, say, optimization.
> > > One consequence of this is that optimization will change the IDs
> > > of *existing* documents.
> > >
> > > You're right, that numdocs may well be shorter than maxdocs.
> > > That's what I get for reading quickly...
> > >
> > > Best
> > > Erick
> > >
> > > On 5/24/07, Carlos Pita <carlosjosepita@gmail.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > No. It will always be at least as large as the total documents.
> But
> > > that
> > > > > will also count deleted documents.
> > > >
> > > >
> > > >
> > > > Do you mean that deleted document ids won't be reutilized, so the
> > index
> > > > maxDoc will grow more and more with time? Isn't there any way to
> > > compress
> > > > the range? It seems strange to me, considering that an example in
> the
> > > book
> > > > suggests to use the document id as an array index for an array of
> > maxDoc
> > > > elements.
> > > >
> > > > Cheers,
> > > > Carlos
> > > >
> > > > Why wouldn't numdocs serve?
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > >
> > > > > The motivation of this question is that I want to associate some
> > info
> > > to
> > > > > > each document in the index, and in order to access this
> additional
> > > > data
> > > > > in
> > > > > > O(1) I would like to do this through an array indexing. But
the
> > > array
> > > > > size
> > > > > > shouldn't be a lot greater than the total number of documents.
I
> > see
> > > > > that
> > > > > > something similar is done in the example of section 6.1 of
> Lucene
> > in
> > > > > > Action,
> > > > > > but for sorting purposes, which is not my case.
> > > > > >
> > > > > > Related to this: how can update my array of extra data when
> > > documents
> > > > > are
> > > > > > added/removed to/from the index? Is there any feedback mechanism
> > by
> > > > > means
> > > > > > of
> > > > > > callbacks or event handlers?
> > > > > >
> > > > > > Thank you in advance.
> > > > > > Regards,
> > > > > > Carlos
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message