impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeszy <jes...@gmail.com>
Subject Re: impala is not parallelized
Date Thu, 03 Aug 2017 12:38:15 GMT
Also check block size.

On 3 August 2017 at 14:36, 孙清孟 <sqm2050@gmail.com> wrote:
> I find the difference between the two clusters, the replication of  HDFS in
> the Normal cluster is 3, another one is 1,
> and shortcircuit is enable!
>
> Thanks.
>
> 2017-08-03 15:02 GMT+08:00 孙清孟 <sqm2050@gmail.com>:
>
>> Hi Jeszy:
>>   Thanks for your reply.
>>
>>  On another cluster with two instances, I do the same SQL, and the file
>> size is smaller  :
>>
>> F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
>> WRITE TO HDFS [default.cdr_partition_par_false, OVERWRITE=true]
>> |  partitions=1
>> |  mem-estimate=1.00GB mem-reservation=0B
>> |
>> 00:SCAN HDFS [default.cdr_partition, RANDOM]
>>    partitions=1/1 files=1 size=762.93MB
>>
>> And the single file is splitted:
>>  Averaged Fragment F00
>> <http://192.168.33.22:7180/cmf/impala/queryDetails?queryId=cb433d9e02457f39%3A247dc1f100000000&serviceName=impala#>
>>
>>    - split sizes: *min: 378.93 MB, max: 384.00 MB, avg: 381.46 MB,
>>    stddev: 2.54 MB*
>>
>>
>> Is there some configuration wrong in my cluster?
>>
>> 2017-08-03 13:20 GMT+08:00 Jeszy <jeszyb@gmail.com>:
>>
>>> Putting some more files in the source table will allow you to use more
>>> hosts.
>>>
>>> On 3 August 2017 at 05:08, Taras Bobrovytsky <tarasbob@apache.org> wrote:
>>> > Yes, it looks like all the work is being done on a single node because
>>> > hosts=1.
>>> >
>>> > On Wed, Aug 2, 2017 at 7:55 PM, 孙清孟 <sqm2050@gmail.com> wrote:
>>> >
>>> >> This is my impala cluster:
>>> >>
>>> >>
>>> >>   <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Role Type <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> State <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Host <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Commission State
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances#sort>
>>> >> Role Group <http://192.168.200.101:7180/cmf/services/14/instances#sort
>>> >
>>> >> Impala Catalog Server
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/48/status>
>>> Started
>>> >> with Outdated Configuration cdha0.embed.com
>>> >> <http://192.168.200.101:7180/cmf/hardware/hosts/1/status> Commissioned
>>> >> Impala
>>> >> Catalog Server Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/50/status>
>>> Started
>>> >> cdha2.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/3/status>
>>> >> Commissioned Impala Daemon Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/52/status>
>>> Started
>>> >> cdha1.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/2/status>
>>> >> Commissioned Impala Daemon Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/49/status>
>>> Started
>>> >> with Outdated Configuration cdha3.embed.com
>>> >> <http://192.168.200.101:7180/cmf/hardware/hosts/5/status> Commissioned
>>> >> Impala
>>> >> Daemon Default Group
>>> >> Impala Daemon
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/51/status>
>>> Started
>>> >> cdha4.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/4/status>
>>> >> Commissioned Impala Daemon Default Group
>>> >> Impala StateStore
>>> >> <http://192.168.200.101:7180/cmf/services/14/instances/53/status>
>>> Started
>>> >> cdha0.embed.com <http://192.168.200.101:7180/c
>>> mf/hardware/hosts/1/status>
>>> >> Commissioned Impala StateStore Default Group
>>> >>
>>> >>
>>> >> When I run a SQL:
>>> >>
>>> >> insert into table cdr_partition_true partition(ym = '2014-11') select
>>> >>         call_1,
>>> >>         call_2,
>>> >>         type_1,
>>> >>         own_1,
>>> >>         own_2,
>>> >>         hdfs_id,
>>> >>         a_imsi,
>>> >>         p_imsi,
>>> >>         a_imei,
>>> >>         p_imei,
>>> >>         CAST(unix_timestamp(start_time) AS INT),
>>> >>         CAST(unix_timestamp(end_time) AS INT),
>>> >>         time,
>>> >>         a_LAC,
>>> >>         a_CI,
>>> >>         p_LAC,
>>> >>         p_CIfrom cdr_partition_cwang
>>> >>
>>> >>
>>> >>
>>> >> The EXPLAIN, it says only one host:
>>> >>
>>> >> ----------------
>>> >> Estimated Per-Host Requirements: Memory=2.80GB VCores=1
>>> >> WARNING: The following tables are missing relevant table and/or column
>>> >> statistics.
>>> >> default.cdr_partition_cwang
>>> >>
>>> >> WRITE TO HDFS [default.cdr_partition_true, OVERWRITE=false,
>>> >> PARTITION-KEYS=('2014-11')]
>>> >> |  partitions=1
>>> >> |  hosts=1 per-host-mem=1.00GB
>>> >> |
>>> >> 00:SCAN HDFS [default.cdr_partition_cwang, RANDOM]
>>> >>    partitions=1/1 files=1 size=2.00GB
>>> >>    table stats: unavailable
>>> >>    column stats: unavailable
>>> >>    hosts=1 per-host-mem=1.80GB
>>> >>    tuple-ids=0 row-size=128B cardinality=unavailable
>>> >> ----------------
>>> >>
>>> >> And instance is 1  -> Average Fragment F00.num instances: 1
>>> >>
>>> >> Is this means my work only was performed  on only one impala node?
>>> >>
>>>
>>
>>

Mime
View raw message