impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 孙清孟 <sqm2...@gmail.com>
Subject Re: impala is not parallelized
Date Thu, 03 Aug 2017 07:02:26 GMT
Hi Jeszy:
  Thanks for your reply.

 On another cluster with two instances, I do the same SQL, and the file
size is smaller  :

F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
WRITE TO HDFS [default.cdr_partition_par_false, OVERWRITE=true]
|  partitions=1
|  mem-estimate=1.00GB mem-reservation=0B
|
00:SCAN HDFS [default.cdr_partition, RANDOM]
   partitions=1/1 files=1 size=762.93MB

And the single file is splitted:
 Averaged Fragment F00
<http://192.168.33.22:7180/cmf/impala/queryDetails?queryId=cb433d9e02457f39%3A247dc1f100000000&serviceName=impala#>

   - split sizes: *min: 378.93 MB, max: 384.00 MB, avg: 381.46 MB, stddev:
   2.54 MB*


Is there some configuration wrong in my cluster?

2017-08-03 13:20 GMT+08:00 Jeszy <jeszyb@gmail.com>:

> Putting some more files in the source table will allow you to use more
> hosts.
>
> On 3 August 2017 at 05:08, Taras Bobrovytsky <tarasbob@apache.org> wrote:
> > Yes, it looks like all the work is being done on a single node because
> > hosts=1.
> >
> > On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟 <sqm2050@gmail.com> wrote:
> >
> >> This is my impala cluster:
> >>
> >>
> >>   <http://192.168.200.101:7180/cmf/services/14/instances#sort>
> >> Role Type <http://192.168.200.101:7180/cmf/services/14/instances#sort>
> >> State <http://192.168.200.101:7180/cmf/services/14/instances#sort>
> >> Host <http://192.168.200.101:7180/cmf/services/14/instances#sort>
> >> Commission State
> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort>
> >> Role Group <http://192.168.200.101:7180/cmf/services/14/instances#sort>
> >> Impala Catalog Server
> >> <http://192.168.200.101:7180/cmf/services/14/instances/48/status>
> Started
> >> with Outdated Configuration cdha0.embed.com
> >> <http://192.168.200.101:7180/cmf/hardware/hosts/1/status> Commissioned
> >> Impala
> >> Catalog Server Default Group
> >> Impala Daemon
> >> <http://192.168.200.101:7180/cmf/services/14/instances/50/status>
> Started
> >> cdha2.embed.com <http://192.168.200.101:7180/
> cmf/hardware/hosts/3/status>
> >> Commissioned Impala Daemon Default Group
> >> Impala Daemon
> >> <http://192.168.200.101:7180/cmf/services/14/instances/52/status>
> Started
> >> cdha1.embed.com <http://192.168.200.101:7180/
> cmf/hardware/hosts/2/status>
> >> Commissioned Impala Daemon Default Group
> >> Impala Daemon
> >> <http://192.168.200.101:7180/cmf/services/14/instances/49/status>
> Started
> >> with Outdated Configuration cdha3.embed.com
> >> <http://192.168.200.101:7180/cmf/hardware/hosts/5/status> Commissioned
> >> Impala
> >> Daemon Default Group
> >> Impala Daemon
> >> <http://192.168.200.101:7180/cmf/services/14/instances/51/status>
> Started
> >> cdha4.embed.com <http://192.168.200.101:7180/
> cmf/hardware/hosts/4/status>
> >> Commissioned Impala Daemon Default Group
> >> Impala StateStore
> >> <http://192.168.200.101:7180/cmf/services/14/instances/53/status>
> Started
> >> cdha0.embed.com <http://192.168.200.101:7180/
> cmf/hardware/hosts/1/status>
> >> Commissioned Impala StateStore Default Group
> >>
> >>
> >> When I run a SQL:
> >>
> >> insert into table cdr_partition_true partition(ym = '2014-11') select
> >>         call_1,
> >>         call_2,
> >>         type_1,
> >>         own_1,
> >>         own_2,
> >>         hdfs_id,
> >>         a_imsi,
> >>         p_imsi,
> >>         a_imei,
> >>         p_imei,
> >>         CAST(unix_timestamp(start_time) AS INT),
> >>         CAST(unix_timestamp(end_time) AS INT),
> >>         time,
> >>         a_LAC,
> >>         a_CI,
> >>         p_LAC,
> >>         p_CIfrom cdr_partition_cwang
> >>
> >>
> >>
> >> The EXPLAIN, it says only one host:
> >>
> >> ----------------
> >> Estimated Per-Host Requirements: Memory=2.80GB VCores=1
> >> WARNING: The following tables are missing relevant table and/or column
> >> statistics.
> >> default.cdr_partition_cwang
> >>
> >> WRITE TO HDFS [default.cdr_partition_true, OVERWRITE=false,
> >> PARTITION-KEYS=('2014-11')]
> >> |  partitions=1
> >> |  hosts=1 per-host-mem=1.00GB
> >> |
> >> 00:SCAN HDFS [default.cdr_partition_cwang, RANDOM]
> >>    partitions=1/1 files=1 size=2.00GB
> >>    table stats: unavailable
> >>    column stats: unavailable
> >>    hosts=1 per-host-mem=1.80GB
> >>    tuple-ids=0 row-size=128B cardinality=unavailable
> >> ----------------
> >>
> >> And instance is 1  -> Average Fragment F00.num instances: 1
> >>
> >> Is this means my work only was performed  on only one impala node?
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message