lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: MultiReader docid reliability
Date Fri, 30 May 2014 14:49:18 GMT
Hi,

thanks Michael and Alan. Is enough to know that re-opening the index
there is no guarantee that the docids are maintained also if the index
does not change.

And I will try the question also on the Solr mailinglist.


nicola.


On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote:
> There is a Solr document cache that holds field values too, see: 
> http://wiki.apache.org/solr/SolrCaching
> 
> Maybe take this question over to the solr mailing list?
> 
> -Mike
> 
> On 5/30/2014 10:32 AM, Alan Woodward wrote:
> > Solr caches hold lucene docids, which are invalidated every time a new searcher
is opened.  The various fields for a response aren't cached as far as I know, they're reloaded
on each request.  But loading the fields for 10 documents is typically very fast, compared
to searching over a very large collection.
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 30 May 2014, at 11:20, Nicola Buso wrote:
> >
> >> Hi Alan,
> >>
> >> just to make it more typical (yes there are not IndexWriters open on
> >> that indexes) how solr is caching results? the first thing I would like
> >> to do is to store the docs ids and return to the reader for the real
> >> content. Is solr storing the whole results with all values?
> >>
> >>
> >> nicola.
> >>
> >>
> >> On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote:
> >>> If the index is truly unchanging (ie there's no IndexWriter open on
> >>> it) then I guess the document numbers will be stable across reopens.
> >>> But this is a pretty specialized situation, and the docs are really
> >>> there to warn you off trying to rely on this for more typical uses.
> >>>
> >>> Alan Woodward
> >>> www.flax.co.uk
> >>>
> >>>
> >>>
> >>> On 30 May 2014, at 10:39, Nicola Buso wrote:
> >>>
> >>>> Hi Alan,
> >>>>
> >>>> thanks a lot for the reply.
> >>>>
> >>>> For what I understood from your reply if the index is not changing
> >>>> (no
> >>>> adds, deletes even updates) the docs id viewed by the MultiReader
> >>>> will
> >>>> not change if you open more times that unchanged index also in
> >>>> different
> >>>> environments.
> >>>>
> >>>> If this is true (my understanding) the word "ephemeral" in the API
> >>>> could
> >>>> be elaborated a bit more.
> >>>>
> >>>>
> >>>> nicola
> >>>>
> >>>> On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote:
> >>>>> Hi Nicola,
> >>>>>
> >>>>>
> >>>>> 1) A session here means as long as you have that MultiReader open.
> >>>>> IndexReaders see a snapshot of the index and so document ids
> >>>>> shouldn't change over the lifetime of an IndexReader, even if the
> >>>>> index is being updated.
> >>>>>
> >>>>>
> >>>>> 2) MultiReader just takes an array of subindexes, so as long as
> >>>>> the
> >>>>> subindexes are passed to the MultiReader constructor in the same
> >>>>> order
> >>>>> on both machines, the docBase assigned to each reader context
> >>>>> should
> >>>>> be the same.
> >>>>>
> >>>>> Alan Woodward
> >>>>> www.flax.co.uk
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 29 May 2014, at 14:29, Nicola Buso wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> from the javadocs:
> >>>>>>
> >>>>>> ----
> >>>>>> For efficiency, in this API documents are often referred to
via
> >>>>>> document
> >>>>>> numbers, non-negative integers which each name a unique document
> >>>>>> in
> >>>>>> the
> >>>>>> index. These document numbers are ephemeral -- they may change
> >>>>>> as
> >>>>>> documents are added to and deleted from an index. Clients should
> >>>>>> thus
> >>>>>> not rely on a given document having the same number between
> >>>>>> sessions.
> >>>>>> ----
> >>>>>>
> >>>>>> What does it mean in this context "sessions"? Are search
> >>>>>> sessions?
> >>>>>>
> >>>>>> 1) If I have an index that does not change (no deletes or
> >>>>>> updates)
> >>>>>> and
> >>>>>> I'm keeping the MultiReader open, can the docid change executing
> >>>>>> more
> >>>>>> times the same search on that reader?
> >>>>>>
> >>>>>> 2) Opening the same set of indexes in a MultiReader on different
> >>>>>> machines will assign different docids to the same document at
> >>>>>> runtime or
> >>>>>> the algorithm to calculate such docids in some way can guarantee
> >>>>>> that
> >>>>>> static indexes will have the same docids in different machines
> >>>>>> (than
> >>>>>> separated JVMs)?
> >>>>>>
> >>>>>>
> >>>>>> nicola.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -- 
> >>>>>> Nicola Buso <nbuso@ebi.ac.uk>
> >>>>>> EMBL-EBI
> >>>>>>
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail:
> >>>>>> java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>
> >>>> -- 
> >>>> Nicola Buso <nbuso@ebi.ac.uk>
> >>>> EMBL-EBI
> >>>>
> >>>>
> >>>
> >> -- 
> >> Nicola Buso
> >> Software Engineer - Web Production Team
> >>
> >> European Bioinformatics Institute (EMBL-EBI)
> >> European Molecular Biology Laboratory
> >>
> >> Wellcome Trust Genome Campus
> >> Hinxton
> >> Cambridge CB10 1SD
> >> United Kingdom
> >>
> >> URL: http://www.ebi.ac.uk
> >>
> >
> 

-- 
Nicola Buso
Software Engineer - Web Production Team

European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory

Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message