lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <roman.ch...@gmail.com>
Subject Re: paging vs streaming. spawn from (Processing a lot of results in Solr)
Date Sat, 27 Jul 2013 20:30:47 GMT
Hi Mikhail,

I can see it is lazy-loading, but I can't judge how much complex it becomes
(presumably, the filter dispatching mechanism is doing also other things -
it is there not only for streaming).

Let me just explain better what I found when I dug inside solr: documents
(results of the query) are loaded before they are passed into a writer - so
the writers are expecting to encounter the solr documents, but these
documents were loaded by one of the components before rendering them - so
it is kinda 'hard-coded'. But if solr was NOT loading these docs before
passing them to a writer, writer can load them instead (hence lazy loading,
but the difference is in numbers - it could deal with hundreds of thousands
of docs, instead of few thousands now).

I see one crucial point: this could work without any new handler/servlet -
solr would just gain a new parameter, something like: 'lazy=true' ;) and
people can use whatever 'wt' they did before

disclaimer: i don't know whether that would break other stuff, I only know
that I am using the same idea to dump what i need without breaking things
(so far...;-)) - but obviously, i didn't want to patch solr core

roman


On Sat, Jul 27, 2013 at 3:52 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Roman,
>
> Let me briefly explain  the design
>
> special RequestParser stores servlet output stream into the context
> https://github.com/m-khl/solr-patches/compare/streaming#L7R22
>
> then special component injects special PostFilter/DelegatingCollector which
> writes right into output
> https://github.com/m-khl/solr-patches/compare/streaming#L2R146
>
> here is how it streams the doc, you see it's lazy enough
> https://github.com/m-khl/solr-patches/compare/streaming#L2R181
>
> I mention that it disables later collectors
> https://github.com/m-khl/solr-patches/compare/streaming#L2R57
> hence, no facets with streaming, yet as well as memory consumption.
>
> This test shows how it works
> https://github.com/m-khl/solr-patches/compare/streaming#L15R115
>
> all other code purposed for distributed search.
>
>
>
> On Sat, Jul 27, 2013 at 4:44 PM, Roman Chyla <roman.chyla@gmail.com>
> wrote:
>
> > Mikhail,
> > If your solution gives lazy loading of solr docs /and thus streaming of
> > huge result lists/ it should be big YES!
> > Roman
> > On 27 Jul 2013 07:55, "Mikhail Khludnev" <mkhludnev@griddynamics.com>
> > wrote:
> >
> > > Otis,
> > > You gave links to 'deep paging' when I asked about response streaming.
> > > Let me understand. From my POV, deep paging is a special case for
> regular
> > > search scenarios. We definitely need it in Solr. However, if we are
> > talking
> > > about data analytic like problems, when we need to select an "endless"
> > > stream of responses (or store them in file as Roman did), 'deep paging'
> > is
> > > a suboptimal hack.
> > > What's your vision on this?
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message