lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: MultiReader docid reliability
Date Fri, 30 May 2014 19:50:05 GMT
If you do an optimize, btw, the internal doc IDs may change.....

But _why_ do you want to keep them? You may have very good reasons, but
it's not clear that this is necessary/desirable from what you've said so
far...

Best,
Erick


On Fri, May 30, 2014 at 7:49 AM, Nicola Buso <nbuso@ebi.ac.uk> wrote:

> Hi,
>
> thanks Michael and Alan. Is enough to know that re-opening the index
> there is no guarantee that the docids are maintained also if the index
> does not change.
>
> And I will try the question also on the Solr mailinglist.
>
>
> nicola.
>
>
> On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote:
> > There is a Solr document cache that holds field values too, see:
> > http://wiki.apache.org/solr/SolrCaching
> >
> > Maybe take this question over to the solr mailing list?
> >
> > -Mike
> >
> > On 5/30/2014 10:32 AM, Alan Woodward wrote:
> > > Solr caches hold lucene docids, which are invalidated every time a new
> searcher is opened.  The various fields for a response aren't cached as far
> as I know, they're reloaded on each request.  But loading the fields for 10
> documents is typically very fast, compared to searching over a very large
> collection.
> > >
> > > Alan Woodward
> > > www.flax.co.uk
> > >
> > >
> > > On 30 May 2014, at 11:20, Nicola Buso wrote:
> > >
> > >> Hi Alan,
> > >>
> > >> just to make it more typical (yes there are not IndexWriters open on
> > >> that indexes) how solr is caching results? the first thing I would
> like
> > >> to do is to store the docs ids and return to the reader for the real
> > >> content. Is solr storing the whole results with all values?
> > >>
> > >>
> > >> nicola.
> > >>
> > >>
> > >> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
> > >>> If the index is truly unchanging (ie there's no IndexWriter open on
> > >>> it) then I guess the document numbers will be stable across reopens.
> > >>> But this is a pretty specialized situation, and the docs are really
> > >>> there to warn you off trying to rely on this for more typical uses.
> > >>>
> > >>> Alan Woodward
> > >>> www.flax.co.uk
> > >>>
> > >>>
> > >>>
> > >>> On 30 May 2014, at 10:39, Nicola Buso wrote:
> > >>>
> > >>>> Hi Alan,
> > >>>>
> > >>>> thanks a lot for the reply.
> > >>>>
> > >>>> For what I understood from your reply if the index is not changing
> > >>>> (no
> > >>>> adds, deletes even updates) the docs id viewed by the MultiReader
> > >>>> will
> > >>>> not change if you open more times that unchanged index also in
> > >>>> different
> > >>>> environments.
> > >>>>
> > >>>> If this is true (my understanding) the word "ephemeral" in the
API
> > >>>> could
> > >>>> be elaborated a bit more.
> > >>>>
> > >>>>
> > >>>> nicola
> > >>>>
> > >>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
> > >>>>> Hi Nicola,
> > >>>>>
> > >>>>>
> > >>>>> 1) A session here means as long as you have that MultiReader
open.
> > >>>>> IndexReaders see a snapshot of the index and so document ids
> > >>>>> shouldn't change over the lifetime of an IndexReader, even
if the
> > >>>>> index is being updated.
> > >>>>>
> > >>>>>
> > >>>>> 2) MultiReader just takes an array of subindexes, so as long
as
> > >>>>> the
> > >>>>> subindexes are passed to the MultiReader constructor in the
same
> > >>>>> order
> > >>>>> on both machines, the docBase assigned to each reader context
> > >>>>> should
> > >>>>> be the same.
> > >>>>>
> > >>>>> Alan Woodward
> > >>>>> www.flax.co.uk
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
> > >>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> from the javadocs:
> > >>>>>>
> > >>>>>> ----
> > >>>>>> For efficiency, in this API documents are often referred
to via
> > >>>>>> document
> > >>>>>> numbers, non-negative integers which each name a unique
document
> > >>>>>> in
> > >>>>>> the
> > >>>>>> index. These document numbers are ephemeral -- they may
change
> > >>>>>> as
> > >>>>>> documents are added to and deleted from an index. Clients
should
> > >>>>>> thus
> > >>>>>> not rely on a given document having the same number between
> > >>>>>> sessions.
> > >>>>>> ----
> > >>>>>>
> > >>>>>> What does it mean in this context "sessions"? Are search
> > >>>>>> sessions?
> > >>>>>>
> > >>>>>> 1) If I have an index that does not change (no deletes
or
> > >>>>>> updates)
> > >>>>>> and
> > >>>>>> I'm keeping the MultiReader open, can the docid change
executing
> > >>>>>> more
> > >>>>>> times the same search on that reader?
> > >>>>>>
> > >>>>>> 2) Opening the same set of indexes in a MultiReader on
different
> > >>>>>> machines will assign different docids to the same document
at
> > >>>>>> runtime or
> > >>>>>> the algorithm to calculate such docids in some way can
guarantee
> > >>>>>> that
> > >>>>>> static indexes will have the same docids in different machines
> > >>>>>> (than
> > >>>>>> separated JVMs)?
> > >>>>>>
> > >>>>>>
> > >>>>>> nicola.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Nicola Buso <nbuso@ebi.ac.uk>
> > >>>>>> EMBL-EBI
> > >>>>>>
> > >>>>>>
> > >>>>>>
> ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>>>> For additional commands, e-mail:
> > >>>>>> java-user-help@lucene.apache.org
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>> --
> > >>>> Nicola Buso <nbuso@ebi.ac.uk>
> > >>>> EMBL-EBI
> > >>>>
> > >>>>
> > >>>
> > >> --
> > >> Nicola Buso
> > >> Software Engineer - Web Production Team
> > >>
> > >> European Bioinformatics Institute (EMBL-EBI)
> > >> European Molecular Biology Laboratory
> > >>
> > >> Wellcome Trust Genome Campus
> > >> Hinxton
> > >> Cambridge CB10 1SD
> > >> United Kingdom
> > >>
> > >> URL: http://www.ebi.ac.uk
> > >>
> > >
> >
>
> --
> Nicola Buso
> Software Engineer - Web Production Team
>
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
>
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD
> United Kingdom
>
> URL: http://www.ebi.ac.uk
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message