impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bin Wang <wbi...@gmail.com>
Subject Re: Memory limit exceed even with very simple count query
Date Wed, 05 Apr 2017 17:19:37 GMT
And when I try to dump some partitions of this table into another table, it
says I need much memory per node. I don't understand why I need so much
memory for just read and write?

[szq7.appadhoc.com:21000] > insert overwrite adhoc_data_fast.tmplog
partition (day)
                         >       select `timestamp`, appid, clientid,
statkey, expid, modid, value, summary, custom, uploadtime, day
                         >       from adhoc_data_fast.log where day >=
"2017-04-03" and day <= "2017-04-06"
                         >     ;
Query: insert overwrite adhoc_data_fast.tmplog partition (day)
     select `timestamp`, appid, clientid, statkey, expid, modid, value,
summary, custom, uploadtime, day
     from adhoc_data_fast.log where day >= "2017-04-03" and day <=
"2017-04-06"
Query submitted at: 2017-04-06 01:18:14 (Coordinator:
http://szq7.appadhoc.com:25000)
ERROR: Rejected query from pool default-pool : request memory needed 188.85
GB per node is greater than process mem limit 128.00 GB.

Use the MEM_LIMIT query option to indicate how much memory is required per
node.


Bin Wang <wbin00@gmail.com>于2017年4月6日周四 上午1:14写道:

> Will Impala load all the file into the memory? That sounds horrible. And
> with "show partition adhoc_data_fast.log", the compressed files are no
> bigger that 4GB:
>
> | 2017-04-04 | -1    | 46     | 2.69GB   | NOT CACHED   | NOT CACHED
>  | AVRO   | false             |
> hdfs://hfds-service/user/hive/warehouse/adhoc_data_fast.db/log/2017-04-04 |
> | 2017-04-05 | -1    | 25     | 3.42GB   | NOT CACHED   | NOT CACHED
>  | AVRO   | false             |
> hdfs://hfds-service/user/hive/warehouse/adhoc_data_fast.db/log/2017-04-05 |
>
>
> Marcel Kornacker <marcel@cloudera.com>于2017年4月6日周四 上午12:58写道:
>
> Apparently you have a gzipped file that is >=50GB. You either need to
> break up those files, or run on larger machines.
>
> On Wed, Apr 5, 2017 at 9:52 AM, Bin Wang <wbin00@gmail.com> wrote:
> > Hi,
> >
> > I'm using Impala on production for a while. But since yesterday, some
> > queries reports memory limit exceeded. Then I try a very simple count
> query,
> > it still have memory limit exceeded.
> >
> > The query is:
> >
> > select count(0) from adhoc_data_fast.log where day>='2017-04-04' and
> > day<='2017-04-06';
> >
> > And the response in the Impala shell is:
> >
> > Query submitted at: 2017-04-06 00:41:00 (Coordinator:
> > http://szq7.appadhoc.com:25000)
> > Query progress can be monitored at:
> >
> http://szq7.appadhoc.com:25000/query_plan?query_id=4947a3fecd146df4:734bcc1d00000000
> > WARNINGS:
> > Memory limit exceeded
> > GzipDecompressor failed to allocate 54525952000 bytes.
> >
> > I have many nodes and each of them have lots of memory avaliable (~ 60
> GB).
> > And the query failed very fast after I execute it and the nodes have
> almost
> > no memory usage.
> >
> > The table "adhoc_data_fast.log" is an AVRO table and is encoded with gzip
> > and is partitioned by the field "day". And each partition has no more
> than
> > one billion rows.
> >
> > My Impala version is:
> >
> > hdfs@szq7:/home/ubuntu$ impalad --version
> > impalad version 2.7.0-cdh5.9.1 RELEASE (build
> > 24ad6df788d66e4af9496edb26ac4d1f1d2a1f2c)
> > Built on Wed Jan 11 13:39:25 PST 2017
> >
> > Any one can help for this? Thanks very much!
> >
>
>

Mime
View raw message