incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: sessions
Date Fri, 08 Feb 2013 13:31:30 GMT
Yeah the only thing I'm concerned about is the rapid increase in number of
files per shard when there are a lot of updates on the index.  I think that
we can overcome that though.

Aaron


On Fri, Feb 8, 2013 at 7:19 AM, Tim Williams <williamstw@gmail.com> wrote:

> On Mon, Feb 4, 2013 at 9:48 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> > I had an idea today on how to remove sessions and still get the same
> > behavior.
> >
> > Currently the issue is that as document id's change as the index is
> > updated/compacted, and for efficiency we only want to fetch the documents
> > should be returned. e.g.
> >
> > Search arrives only the first 10 hits are requested.  The query gets
> > sprayed to all the other servers, and the top 10 document locations are
> > returned to the issuing server.  The hits are then sorted and merged then
> > the top 10 of those are returned.  The client then will issue a request
> for
> > document data to be retrieved.  However if in-between those calls the
> > document location that contains the document id changes because of
> indexes
> > changes the wrong data could be returned or not found.  The sessions were
> > introduced to snapshot the indexes during these 2 actions.
> >
> > Proposal:
> >
> > Change document location, which currently contains the shard id and the
> > Lucene document id (referenced from the composition of all the segments
> in
> > the index).  To instead have the document location contain the shard id +
> > segment name + document id in the given segment name.  This will ensure
> > that the correct document is located between search and fetch.  The only
> > other thing we have to do is keep the old segments around for a
> reasonable
> > amount time (configurable per table maybe default is 30 seconds?).  At
> the
> > very least if the segment referenced is not available, the error can be
> > detected and properly handled in the from of a nice exception.
> >
> > We should also create a simple single method that searches and fetches
> > documents in a single call.
> >
> > Thoughts?
>
> I like it much better than the overhead and complexity we're taking on
> with sessions.  I haven't looked at whether the segment names are so
> readily available to the search code though... but if you say it'll
> work, I like it:)
>
> --tim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message