drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <amansi...@apache.org>
Subject Re: Hash Aggregate Memory usage
Date Fri, 27 May 2016 18:21:00 GMT
Rahul,  can you send me the query profile separately ?  Also, can you try
group-by on fixed-width columns instead of Varchar ?
With single group, the hash table itself should be consuming relatively
small amount of memory.

On Fri, May 27, 2016 at 11:14 AM, Zelaine Fong <zfong@maprtech.com> wrote:

> My guess would be that for hashing, a hash table is pre-allocated based on
> the number of keys in the hash.  That would explain why with more keys, the
> memory usage grows.  But that's just my guess.  Someone who really
> understands how this works should chime in :).
>
> -- Zelaine
>
> On Fri, May 27, 2016 at 10:36 AM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Any inputs on this one?
> >
> > On Wed, May 25, 2016 at 7:51 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Its using hash aggregation.
> > > On May 25, 2016 7:48 PM, "Zelaine Fong" <zfong@maprtech.com> wrote:
> > >
> > >> What does the explain plan show?  I.e., is the group by being done
> via a
> > >> hash agg or a streaming agg?  If it's a streaming agg, then you still
> > have
> > >> to sort the entire data set before you reduce it down to a single
> group.
> > >> That would explain the increase in memory as you add group by keys.
> > >>
> > >> -- Zelaine
> > >>
> > >> On Wed, May 25, 2016 at 5:50 PM, rahul challapalli <
> > >> challapallirahul@gmail.com> wrote:
> > >>
> > >> > I am trying to understand the memory usage patterns for hash
> > aggregate.
> > >> The
> > >> > below query completes in 9.163 seconds and uses 24 MB of memory for
> > >> > hash-aggregate (according to profile)
> > >> >
> > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1,
> > 'kfjhl'
> > >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5  from mem_heavy1) d group by
> > d.c1,
> > >> > d.c2, d.c3, d.c4, d.c5;
> > >> >
> > >> > Adding one more constant column to the group by, the below query
> takes
> > >> > 11.638 seconds and uses 29 MB of ram
> > >> >
> > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1,
> > 'kfjhl'
> > >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6  from mem_heavy1)
d
> > >> group
> > >> > by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6;
> > >> >
> > >> > The below query with one more constant column added to group by
> 14.622
> > >> > seconds and uses 33 MB memory
> > >> >
> > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1,
> > 'kfjhl'
> > >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6, 'ciuh' c7  from
> > >> > mem_heavy1) d group by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6, d.c7;
> > >> >
> > >> >
> > >> > As you can see, there is only one disctinct group in all the above
> > >> cases.
> > >> > It looks like the memory usage is proportional to no of elements in
> > the
> > >> > group by clause. Is this expected?
> > >> >
> > >> > Is the increase in time expected between the above queries? (As we
> did
> > >> not
> > >> > introduce any new groups)
> > >> >
> > >> > - Rahul
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message