lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: getTermFreqVector atomicity
Date Mon, 21 May 2007 21:41:30 GMT
An IndexReader doesn't see changes in the index unless you close
and reopen it, but if there is significant time between the time you
fetch your docid and read it's vector, that could be a problem.

You can always use TermEnum/TermDocs to find the doc ID
associated with a particular field you have, but I suspect that
suffers from the same problem. In fact, *anything* you do
between fetching the doc ID and getting its termvector
has this problem, and there's no way I know of to get
termvectors by your own ID.

What might work is a "sanity check" sort of algorithm. That is,
fetch the doc ID, then fetch it's term vector, then look at your
custom field for that doc ID and see if it matches the
original. If not, do it all over again.

But that all seems too complicated to me. Why not just insure
that you use the *same* IndexReader both when you get the
original doc ID and when you get its termverctor? Even a temporary
reference should hold things open long enough to insure that
atomicity.

Best
Erick

On 5/21/07, Walter Ferrara <walter.ferrara@ecomware.it> wrote:
>
> I'm interested in getting the term vector of a lucene doc. The point is,
> it seems I have to give to the IndexReader.getTermFreqVector a doc ID,
> while I would know if there is a way to get the termvector by a doc
> identifier (not lucene doc id, but a my own field). I know how to get
> the lucene docid for the doc I'm interested, but my concern is about the
> non-atomicity of getting a id and pass it to another function.
> This because I reload index time by time, and I'm worried about a loss
> of consistency if the new indexreader remap docids (after deletion for
> example), and I end up accessing a different doc, just because between
> "get the id" and "get the termvector for that id", the reader could have
> been reloaded (and doc-ids changed).
>
> Best,
> Walter
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message