hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Da <jd...@cornell.edu>
Subject Re: question about using PXF
Date Fri, 30 Oct 2015 04:01:37 GMT
Great job on linking the right classpaths!

In terms of resource consumption, pxf daemon shouldn't use more than 1GB
and then some (off-heap stack/memory). C.f.
https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10

In terms of performance, the slow down is unavoidable compared with hbase
shell as the two go through different paths to retrieve the data.

In hbase shell, the client talk with HBaseMaster and RegionServer and get
data in an optimal way where the data could even be warm in the HFile cache
(in memory store).

With PXF, the Java daemon read off the hdfs location in your CREATE
EXTERNAL TABLE definition, talk to NAMENODE to find out the block locations
containing the HFile (on disk), and then use the HBase java file reader to
read the data with some serde, and then send the results to the local HAWQ
segments, where query processing will happen.

PXF is built in a way that it generalizes data access to different systems
(the previous paragraph could also apply to reading HDFS files, Hive files,
name-your-own-system). The additional overhead mostly come from retrieving
the initial metadata. I suppose it would be an interesting experiment to
run when dealing with larger data set and see if the performance difference
is additive or multiplicative.

Noa correct me if I made a mistake :)

Jimmy Da

That’s what people do, they leap, and hoping to God they can fly.


On Thu, Oct 29, 2015 at 6:49 PM, sequoiadb <mailing-list-recv@sequoiadb.com>
wrote:

> Creating soft link from /usr/phd to /usr/hdp makes pxf-service start
> successfully.
>
> Just curious what’s the overhead of using PXF?
>
> postgres=# select * from hbase_member;
>  recordkey  | address:city | address:contry | address:province | info:age
> | info
> :birthday | info:company
>
> ------------+--------------+----------------+------------------+----------+-----
> ----------+--------------
>  scutshuxue | hangzhou     | china          | zhejiang         |       99
> | 1987
> -06-17    | alibaba
>  xiaofeng   | jieyang      | china          | guangdong        |
>   | 1987
> -4-17     | alibaba
> (2 rows)
>
> Time: 434.412 ms
>
> hbase(main):004:0* scan 'member'
> ROW                               COLUMN+CELL
>
>  scutshuxue
>   column=address:city, timestamp=1446104911726, value=hangzhou
>
>  scutshuxue
>   column=address:contry, timestamp=1446104910743, value=china
>
>  scutshuxue                       column=address:province,
> timestamp=1446104910775, value=zhejiang
>  scutshuxue
>   column=info:age, timestamp=1446104987420, value=99
>
>  scutshuxue
>   column=info:birthday, timestamp=1446104910674, value=1987-06-17
>
>  scutshuxue
>   column=info:company, timestamp=1446104910715, value=alibaba
>
>  xiaofeng
>   column=address:city, timestamp=1446104920523, value=jieyang
>
>  xiaofeng
>   column=address:contry, timestamp=1446104920461, value=china
>
>  xiaofeng                         column=address:province,
> timestamp=1446104920486, value=guangdong
>  xiaofeng
>   column=address:town, timestamp=1446104921802, value=xianqiao
>
>  xiaofeng
>   column=info:birthday, timestamp=1446104920358, value=1987-4-17
>
>  xiaofeng
>   column=info:company, timestamp=1446104920423, value=alibaba
>
>  xiaofeng
>   column=info:favorite, timestamp=1446104920397, value=movie
>
> 2 row(s) in 0.0540 seconds
>
> It’s very slow comparing running in hbase shell.
>
>
> 在 2015年10月29日,下午8:33,Noa Horn <nhorn@pivotal.io> 写道:
>
> The problem is probably because the jars that are required by PXF are not
> found.
>
> In the attached log file, this error for example shows that
> hadoop-auth.jar is not found:
> 29-Oct-2015 16:37:33.405 WARNING [localhost-startStop-1]
> com.pivotal.pxf.service.utilities.CustomWebappLoader.addRepositories Failed
> to load entry /usr/phd/current/hadoop-client/hadoop-auth.jar:
> java.nio.file.NoSuchFileException: /usr/phd/current/hadoop-client
>
> Have a look at /etc/conf/gphd/pxf (old version) or /etc/conf/pxf (open
> source version), at the file pxf-private.classpath.
> Every source specified there is required by PXF.
> The default paths for these resources is under /usr/phd/... (Pivotal
> distribution) while your system is hdp so the path is different. Luckily,
> we also provide the paths for hdp distribution - in
> pxf-privatehdp.classpath. If you copy the content of that file into
> pxf-private.classpath and run init and start again, it should work.
>
> As an aside, it's highly recommended to compile and use the open source
> version, because we made a few changes in the rpms.
> From the pxf directory, run 'make tomcat' to generate a tomcat rpm
> (required by PXF) and 'make rpm' to compile and create PXF rpms.
>
> Noa
>
>
> On Wed, Oct 28, 2015 at 11:38 PM, mailing-list-recv <
> mailing-list-recv@sequoiadb.com> wrote:
>
>> Thanks guys,
>>
>> Not sure if mailing list supports attachment, let me try anyway.
>>
>> Status command shows following:
>>
>> [root@cent61 ~]# service pxf-service status
>>
>> Checking if tcServer is up and running...
>>
>> Checking if PXF webapp is up and running...
>>
>> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more
>> information
>>
>> I was using the binary version downloaded from the site. I haven't tried
>> to compile from open source yet.
>> The port 51200 is opened
>>
>> [root@cent61 logs]# cat tcserver.pid
>>
>> 8385
>>
>> [root@cent61 logs]# ps -elf | grep 8385
>>
>> 0 S pxf       8385     1  0  80   0 - 312017 futex_ Oct29 ?
>> 00:00:40 /usr/jdk64/jdk1.7.0_67/bin/java
>> -Djava.util.logging.config.file=/var/gphd/pxf/pxf-service/conf/logging.properties
>> -Djava.util.logging.manager=com.springsource.tcserver.serviceability.logging.TcServerLogManager
>> -Xmx512M -Xss256K
>> -Djava.endorsed.dirs=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/endorsed
>> -classpath
>> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar
>> -Dcatalina.base=/var/gphd/pxf/pxf-service
>> -Dcatalina.home=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
>> -Djava.io.tmpdir=/var/gphd/pxf/pxf-service/temp
>> org.apache.catalina.startup.Bootstrap start
>>
>> 4 S root     23247 22386  0  80   0 - 25813 pipe_w 14:35 pts/2
>> 00:00:00 grep 8385
>>
>> [root@cent61 logs]# netstat -anp | grep 8385
>>
>> tcp        0      0 ::ffff:127.0.0.1:6969       :::*
>>     LISTEN      8385/java
>>
>> tcp        0      0 :::51200                    :::*
>>   LISTEN      8385/java
>>
>> unix  2      [ ]         STREAM     CONNECTED     2344585 8385/java
>>
>>
>> unix  2      [ ]         STREAM     CONNECTED     2344417 8385/java
>>
>>
>>
>> Cheers
>>
>>
>>
>> 在 2015-10-29 03:22:48,"Jimmy Da" <jd462@cornell.edu> 写道:
>>
>> So it seems that Tomcat server is up, but the pxf servlet has not
>> started. To confirm this, you can run "pxf-service status" to double check
>> that pxf service is running.
>>
>> One guess on what the problem is that the Java libraries were not loaded
>> correctly. I am looking at this line
>> Caused by: java.lang.ClassNotFoundException: org.
>> apache.commons.logging.Log
>>
>> Can you double check that you can find all the jar files at the locations
>> in this file?
>>
>> https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath
>>
>> Jimmy Da
>>
>> That’s what people do, they leap, and hoping to God they can fly.
>>
>>
>> On Wed, Oct 28, 2015 at 12:03 PM, Ting(Goden) Yao <tyao@pivotal.io>
>> wrote:
>>
>>> Hi sequoiadb,
>>>
>>> which hawq/pxf version are you using (did you just compile the open
>>> source version or it's former pivotal released hawq versions)?
>>>
>>> Can you also attach pxf logs for investigation?
>>> it's at var/log/gphd/
>>>
>>> -Goden
>>>
>>> On Wed, Oct 28, 2015 at 1:51 AM sequoiadb <
>>> mailing-list-recv@sequoiadb.com> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I’m trying to setup PXF for HBase and got the following error:
>>>> tpch=# create external table hbase_member ( recordkey bytea,
>>>> "address:city" varchar, "address:contry" varchar, "address:province"
>>>> varchar, "info:age" int, "info:birthday" varchar, "info:company" varchar
)
>>>> location ( 'pxf://cent61:50070/member?PROFILE=HBase') FORMAT 'CUSTOM'(
>>>> FORMATTER='pxfwritable_import');
>>>> CREATE EXTERNAL TABLE
>>>> tpch=# select * from hbase_member;
>>>> ERROR:  remote component error (0) from '192.168.31.205:51200':
>>>> couldn't connect to host (libchurl.c:852)
>>>>
>>>> I could successfully create regular tables and perform queries, but
>>>> when I try to create pxf tables I’m keep getting error on connecting to
>>>> port 51200.
>>>>
>>>> So I tried to start pxf-service and got
>>>> [root@cent61 profile.d]# service pxf-service init
>>>> Creating instance 'pxf-service' ...
>>>>   Using separate layout
>>>>   Creating bin/setenv.sh
>>>>   Applying template 'base'
>>>>     Copying template's contents
>>>>     Applying fragment 'context-fragment.xml' to 'conf/context.xml'
>>>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>>>     Applying fragment 'web-fragment.xml' to 'conf/web.xml'
>>>>     Applying fragment 'tomcat-users-fragment.xml'
>>>> to 'conf/tomcat-users.xml'
>>>>     Applying fragment 'catalina-fragment.properties'
>>>> to 'conf/catalina.properties'
>>>>   Applying template 'base-tomcat-7'
>>>>     Copying template's contents
>>>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>>>     Applying fragment 'web-fragment.xml' to 'conf/web.xml'
>>>>     Applying fragment 'catalina-fragment.properties'
>>>> to 'conf/catalina.properties'
>>>>   Applying template 'bio'
>>>>     Copying template's contents
>>>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>>>   Configuring instance 'pxf-service' to use Tomcat version
>>>> 7.0.55.A.RELEASE
>>>>   Setting permissions
>>>> Instance created
>>>> Connector summary
>>>>   Port: 51200   Type: Blocking IO   Secure: false
>>>> [root@cent61 profile.d]# service pxf-service start
>>>> /var/gphd/pxf /
>>>> Creating home directory for pxf.
>>>> Using CATALINA_BASE:   /var/gphd/pxf/pxf-service
>>>> Using
>>>> CATALINA_HOME:   /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
>>>> Using CATALINA_TMPDIR: /var/gphd/pxf/pxf-service/temp
>>>> Using JRE_HOME:        /usr/jdk64/jdk1.7.0_67
>>>> Using CLASSPATH:
>>>>   /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar
>>>> Using CATALINA_PID:    /var/gphd/pxf/pxf-service/logs/tcserver.pid
>>>> Tomcat started.
>>>> Status:                RUNNING as PID=8385
>>>> /
>>>> Checking if tcServer is up and running...
>>>> tcServer not responding, re-trying after 1 second (attempt number 1)
>>>> tcServer not responding, re-trying after 1 second (attempt number 2)
>>>> Checking if PXF webapp is up and running...
>>>> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for
>>>> more information
>>>>
>>>> Now the select statement showing another error:
>>>> tpch=# select * from base_member;
>>>> ERROR:  GPHD component not found (libchurl.c:1058)
>>>>
>>>> Looks like hit this error:
>>>> bool handle_special_error(long response)
>>>> {
>>>> switch (response)
>>>> {
>>>> case 404:
>>>> elog(ERROR, "GPHD component not found");
>>>> break;
>>>> default:
>>>> return false;
>>>> }
>>>> return true;
>>>> }
>>>>
>>>> Now do I need some sort of web service running, in order to make it
>>>> work?
>>>> Is it caused by PXF web app was not able to run? Which log do I
>>>> supposed to look?
>>>> catalina log showing this and I’m not sure if it’s the right one to
>>>> look:
>>>> 29-Oct-2015 16:37:34.923 SEVERE
>>>> [localhost-startStop-1] org.apache.catalina.core.ContainerBase.addChildInternal
ContainerBase.addChild:
>>>> start:
>>>>  org.apache.catalina.LifecycleException: Failed to
>>>> start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]]
>>>> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
>>>>
>>>> at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
>>>>
>>>> at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>>>> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
>>>>
>>>> at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083)
>>>>
>>>> at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)
>>>>
>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> Caused by:
>>>> java.lang.NoClassDefFoundError: Lorg/apache/commons/logging/Log;
>>>> at java.lang.Class.getDeclaredFields0(Native Method)
>>>> at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
>>>> at java.lang.Class.getDeclaredFields(Class.java:1806)
>>>>
>>>> at org.apache.catalina.util.Introspection.getDeclaredFields(Introspection.java:106)
>>>>
>>>> at org.apache.catalina.startup.WebAnnotationSet.loadFieldsAnnotation(WebAnnotationSet.java:270)
>>>>
>>>> at org.apache.catalina.startup.WebAnnotationSet.loadApplicationListenerAnnotations(WebAnnotationSet.java:89)
>>>>
>>>> at org.apache.catalina.startup.WebAnnotationSet.loadApplicationAnnotations(WebAnnotationSet.java:63)
>>>>
>>>> at org.apache.catalina.startup.ContextConfig.applicationAnnotationsConfig(ContextConfig.java:403)
>>>>
>>>> at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:879)
>>>>
>>>> at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:374)
>>>>
>>>> at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
>>>>
>>>> at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
>>>>
>>>> at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5378)
>>>> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>>>> ... 10 more
>>>> Caused by:
>>>> java.lang.ClassNotFoundException: org.apache.commons.logging.Log
>>>>
>>>> at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)
>>>>
>>>> at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>>>> ... 24 more
>>>>
>>>> 29-Oct-2015 16:37:34.924 SEVERE
>>>> [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployWAR
>>>> Error deploying web application archive
>>>> /var/gphd/pxf/pxf-service/webapps/pxf.war
>>>>  java.lang.IllegalStateException: ContainerBase.addChild:
>>>> start: org.apache.catalina.LifecycleException: Failed to
>>>> start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]]
>>>>
>>>> at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:904)
>>>>
>>>> at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>>>> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
>>>>
>>>> at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083)
>>>>
>>>> at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)
>>>>
>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>
>>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> I’m running on a previously built HDP 2.2.8 and performed manual HAWQ
>>>> installation. I got most parts done but stuck at PXF component, any help
>>>> would be appreciate.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>
>>
>
>

Mime
View raw message