hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyle Dunn <kd...@pivotal.io>
Subject Re: hawq-site
Date Tue, 27 Sep 2016 19:38:40 GMT
You might also try tuning this parameter when using HAWQ with YARN:

hawq_rm_min_resource_perseg - Sets the minimum number of containers yarn
will give out for a query (per segment host).

For example, with an 8 node configuration setting this to 4 (up from the
default of 2) results in 32 containers (vs 16 by default).

YARN queues and associated parameters with them will also be relevant here.

-Kyle


On Tue, Sep 27, 2016 at 1:16 PM Kavinder Dhaliwal <kdhaliwal@pivotal.io>
wrote:

> Hi,
>
> Could you please describe a bit more about the nature of your tables in
> hive. What file formats are the underlying files? What is the total size of
> the files?
>
> Additionally, what the Java Heap size you have specified for pxf? and can
> you share your DDL for creating the table.
>
> For more debugging information I would suggest you tail both
> /var/log/pxf/catalina.out and /var/log/pxf/pxf-service.log and then
> run select count(1) from hcatalog.default.t1; It will be obvious from
> catalina.out if there is insufficient memory allocated to pxf.
>
> This will give more insight into where your problem is. Currently there is
> a known issue with PXF, large Hive files (mainly compressed formats), and
> an insufficient Java Heap size. After trying the above, perhaps you can try
> increasing your heap size via the "-Xmx<SIZE>" jvm property and running
> again.
>
> Thanks,
> Kavinder
>
> On Mon, Sep 26, 2016 at 10:32 PM, 来熊 <yin.zhb@163.com> wrote:
>
> > hi,all:
> > I am using PXF and hcatalog to query hive, table t1,t2 in hive, and t1 is
> > large table. and hawq in yarn mode.
> >
> > [hive@master ~]$ hive
> > hive> select count(1) from t1;
> > OK
> > 680852926
> > Time taken: 0.721 seconds, Fetched: 1 row(s)
> > hive> exit;
> >
> > when I query t1 in hawq ,it is very very slow:
> >
> > [gpadmin@master ~]$
> > [gpadmin@master ~]$ psql -U gpadmin -d gpadmin
> > gpadmin=# set pxf_service_address to 'master:51200';
> > SET
> > Time: 0.410 ms
> > gpadmin=# select count(*) from hcatalog.default.t2;
> >  count
> > -------
> >   1000
> > (1 row)
> >
> > Time: 910.853 ms
> >
> > gpadmin=# explain select count(*) from hcatalog.default.t1;
> >                                              QUERY PLAN
> >
> > ------------------------------------------------------------
> > ----------------------------------------
> >  Aggregate  (cost=0.00..431.00 rows=1 width=8)
> >    ->  Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..431.00
> rows=1
> > width=8)
> >          ->  Aggregate  (cost=0.00..431.00 rows=1 width=8)
> >                ->  External Scan on t1  (cost=0.00..431.00 rows=1
> width=1)
> >  Optimizer status: PQO version 1.627
> > (5 rows)
> >
> > Time: 1388.073 ms
> > gpadmin=#
> > gpadmin=# select count(1) from hcatalog.default.t1;
> >
> > wait a long time,and cannot get result.
> > log messages:
> >
> > 2016-09-27 09:46:25.816366 CST,"gpadmin","gpadmin",
> > p764498,th-1935386496,"10.0.230.20","16234",2016-09-27 09:31:13
> > CST,90355,con51,cmd20,seg-1,,,x90355,sx1,"LOG","00000","ConnID 5.
> > Registered in HAWQ resource manager (By OID)",,,,,,"select count(*) from
> > hcatalog.default.t1;",0,,"rmcomm_QD2RM.c",609,
> > 2016-09-27 09:46:25.816508 CST,,,p760393,th-1935386496,,,
> > ,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 5. Expect query resource
> > (256 MB, 0.022727 CORE) x 1 ( MIN 1 ) resource after adjusting based on
> > queue NVSEG limits.",,,,,,,0,,"resqueuemanager.c",1913,
> > 2016-09-27 09:46:25.816603 CST,,,p760393,th-1935386496,,,
> > ,0,con4,,seg-10000,,,,,"LOG","00000","Latency of getting resource
> > allocated is 138us",,,,,,,0,,"resqueuemanager.c",4375,
> > 2016-09-27 09:46:25.816743 CST,"gpadmin","gpadmin",
> > p764498,th-1935386496,"10.0.230.20","16234",2016-09-27 09:31:13
> > CST,90355,con51,cmd20,seg-1,,,x90355,sx1,"LOG","00000","ConnID 5.
> > Acquired resource from resource manager, (256 MB, 0.022727 CORE) x
> > 1.",,,,,,"select count(*) from hcatalog.default.t1;",0,,"
> > rmcomm_QD2RM.c",868,
> > 2016-09-27 09:46:25.816868 CST,"gpadmin","gpadmin",
> > p764498,th-1935386496,"10.0.230.20","16234",2016-09-27 09:31:13
> > CST,90355,con51,cmd20,seg-1,,,x90355,sx1,"LOG","00000","data locality
> > ratio: 0.000; virtual segment number: 1; different host number: 1;
> virtual
> > segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max):
> > (0.000 B/0 B/0 B); segment size with penalty(avg/min/max): (0.000 B/0 B/0
> > B); continuity(avg/min/max): (0.000/0.000/0.000).",,,,,,"select count(*)
> > from hcatalog.default.t1;",0,,"cdbdatalocality.c",3396,
> >
> >
> > I don't know why hawq only get such little resources.
> > Is there any parameters I can set to let it (query hive using pxf and
> > hcatalog) faster like in hive directly.
> >
> >
> >
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message