lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: Huge Facets and Streaming
Date Mon, 21 Aug 2017 19:44:06 GMT
Thanks for sharing this idea, Younik!
I've raised https://issues.apache.org/jira/browse/SOLR-11271.

On Mon, Aug 21, 2017 at 4:00 PM, Yonik Seeley <yseeley@gmail.com> wrote:

> On Mon, Aug 21, 2017 at 6:01 AM, Mikhail Khludnev <mkhl@apache.org> wrote:
> > Hello!
> >
> > I need to count really wide facet on 30 shards index with roughly 100M
> > docs, the facet response is about 100M values takes 0.5G in text file.
> >
> > So, far I experimented with old facets. It calculates per shard facets
> > fine, but then a node which attempts to merge such 30 responses fails due
> > to OOM. It's reasonable.
> >
> > I suppose I'll get pretty much same with json.facet, or it's better
> > scalable?
> >
> > I want to experiment with Streaming Expression, which I've never taken
> yet.
> > I've found facet() expression and select() with partitionKeys they'll try
> > to merge facet values in FacetComponent/Module anyway.
> > Is there a way to merge per-shard facet responses with Streaming?
>
> Yeah, I think I've mentioned before that this is the way it should be
> implemented (per-shard distrib=false facet request merged by streaming
> expression).
> The JSON Facet "stream" method does stream (i.e. does not build up the
> response all in memory first), but only at the shard level and not at
> the distrib/merge level.  This could then be fed into streaming to get
> exact facets (and streaming facets).  But I don't think this has been
> done yet.
>
> -Yonik
>



-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message