hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sequoiadb <mailing-list-r...@sequoiadb.com>
Subject Re: question about using PXF
Date Fri, 30 Oct 2015 04:47:40 GMT
Thanks Jimmy,

That’s very helpful explanation.

It looks like backend/access/external and bin/gpfusion are the main code where PXF request
being sent. I’m proposing to create an interface calling our C API if the given URI indicating
the data source is located in our own system.

It seems we should overwrite pxfwritable_export and pxfwritable_import interface, is it correct?

Thanks

> 在 2015年10月30日,下午12:01,Jimmy Da <jd462@cornell.edu> 写道:
> 
> Great job on linking the right classpaths!
> 
> In terms of resource consumption, pxf daemon shouldn't use more than 1GB and then some
(off-heap stack/memory). C.f.
> https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10 <https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10>
> 
> In terms of performance, the slow down is unavoidable compared with hbase shell as the
two go through different paths to retrieve the data.
> 
> In hbase shell, the client talk with HBaseMaster and RegionServer and get data in an
optimal way where the data could even be warm in the HFile cache (in memory store).
> 
> With PXF, the Java daemon read off the hdfs location in your CREATE EXTERNAL TABLE definition,
talk to NAMENODE to find out the block locations containing the HFile (on disk), and then
use the HBase java file reader to read the data with some serde, and then send the results
to the local HAWQ segments, where query processing will happen.
> 
> PXF is built in a way that it generalizes data access to different systems (the previous
paragraph could also apply to reading HDFS files, Hive files, name-your-own-system). The additional
overhead mostly come from retrieving the initial metadata. I suppose it would be an interesting
experiment to run when dealing with larger data set and see if the performance difference
is additive or multiplicative.
> 
> Noa correct me if I made a mistake :)
> 
> Jimmy Da
> That’s what people do, they leap, and hoping to God they can fly.
> 
> On Thu, Oct 29, 2015 at 6:49 PM, sequoiadb <mailing-list-recv@sequoiadb.com <mailto:mailing-list-recv@sequoiadb.com>>
wrote:
> Creating soft link from /usr/phd to /usr/hdp makes pxf-service start successfully.
> 
> Just curious what’s the overhead of using PXF?
> 
> postgres=# select * from hbase_member;
>  recordkey  | address:city | address:contry | address:province | info:age | info
> :birthday | info:company 
> ------------+--------------+----------------+------------------+----------+-----
> ----------+--------------
>  scutshuxue | hangzhou     | china          | zhejiang         |       99 | 1987
> -06-17    | alibaba
>  xiaofeng   | jieyang      | china          | guangdong        |          | 1987
> -4-17     | alibaba
> (2 rows)
> 
> Time: 434.412 ms
> 
> hbase(main):004:0* scan 'member'
> ROW                               COLUMN+CELL                                       
                                             
>  scutshuxue                       column=address:city, timestamp=1446104911726, value=hangzhou
                                    
>  scutshuxue                       column=address:contry, timestamp=1446104910743, value=china
                                     
>  scutshuxue                       column=address:province, timestamp=1446104910775, value=zhejiang
                                
>  scutshuxue                       column=info:age, timestamp=1446104987420, value=99
                                              
>  scutshuxue                       column=info:birthday, timestamp=1446104910674, value=1987-06-17
                                
>  scutshuxue                       column=info:company, timestamp=1446104910715, value=alibaba
                                     
>  xiaofeng                         column=address:city, timestamp=1446104920523, value=jieyang
                                     
>  xiaofeng                         column=address:contry, timestamp=1446104920461, value=china
                                     
>  xiaofeng                         column=address:province, timestamp=1446104920486, value=guangdong
                               
>  xiaofeng                         column=address:town, timestamp=1446104921802, value=xianqiao
                                    
>  xiaofeng                         column=info:birthday, timestamp=1446104920358, value=1987-4-17
                                 
>  xiaofeng                         column=info:company, timestamp=1446104920423, value=alibaba
                                     
>  xiaofeng                         column=info:favorite, timestamp=1446104920397, value=movie
                                     
> 2 row(s) in 0.0540 seconds
> 
> It’s very slow comparing running in hbase shell.
> 
> 
>> 在 2015年10月29日,下午8:33,Noa Horn <nhorn@pivotal.io <mailto:nhorn@pivotal.io>>
写道:
>> 
>> The problem is probably because the jars that are required by PXF are not found.
>> 
>> In the attached log file, this error for example shows that hadoop-auth.jar is not
found:
>> 29-Oct-2015 16:37:33.405 WARNING [localhost-startStop-1] com.pivotal.pxf.service.utilities.CustomWebappLoader.addRepositories
Failed to load entry /usr/phd/current/hadoop-client/hadoop-auth.jar: java.nio.file.NoSuchFileException:
/usr/phd/current/hadoop-client
>> 
>> Have a look at /etc/conf/gphd/pxf (old version) or /etc/conf/pxf (open source version),
at the file pxf-private.classpath.
>> Every source specified there is required by PXF.
>> The default paths for these resources is under /usr/phd/... (Pivotal distribution)
while your system is hdp so the path is different. Luckily, we also provide the paths for
hdp distribution - in pxf-privatehdp.classpath. If you copy the content of that file into
pxf-private.classpath and run init and start again, it should work.
>> 
>> As an aside, it's highly recommended to compile and use the open source version,
because we made a few changes in the rpms.
>> From the pxf directory, run 'make tomcat' to generate a tomcat rpm (required by PXF)
and 'make rpm' to compile and create PXF rpms.
>> 
>> Noa
>> 
>> 
>> On Wed, Oct 28, 2015 at 11:38 PM, mailing-list-recv <mailing-list-recv@sequoiadb.com
<mailto:mailing-list-recv@sequoiadb.com>> wrote:
>> Thanks guys,
>> 
>> Not sure if mailing list supports attachment, let me try anyway.
>> 
>> Status command shows following:
>> [root@cent61 ~]# service pxf-service status
>> 
>> Checking if tcServer is up and running...
>> 
>> Checking if PXF webapp is up and running...
>> 
>> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more information
>> 
>> I was using the binary version downloaded from the site. I haven't tried to compile
from open source yet.
>> 
>> The port 51200 is opened
>> [root@cent61 logs]# cat tcserver.pid
>> 
>> 8385
>> 
>> [root@cent61 logs]# ps -elf | grep 8385
>> 
>> 0 S pxf       8385     1  0  80   0 - 312017 futex_ Oct29 ?       00:00:40 /usr/jdk64/jdk1.7.0_67/bin/java
-Djava.util.logging.config.file=/var/gphd/pxf/pxf-service/conf/logging.properties -Djava.util.logging.manager=com.springsource.tcserver.serviceability.logging.TcServerLogManager
-Xmx512M -Xss256K -Djava.endorsed.dirs=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/endorsed
-classpath /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar
-Dcatalina.base=/var/gphd/pxf/pxf-service -Dcatalina.home=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
-Djava.io.tmpdir=/var/gphd/pxf/pxf-service/temp org.apache.catalina.startup.Bootstrap start
>> 
>> 4 S root     23247 22386  0  80   0 - 25813 pipe_w 14:35 pts/2    00:00:00 grep 8385
>> 
>> [root@cent61 logs]# netstat -anp | grep 8385
>> 
>> tcp        0      0 ::ffff:127.0.0.1:6969 <http://127.0.0.1:6969/>       :::*
                       LISTEN      8385/java           
>> 
>> tcp        0      0 :::51200                    :::*                        LISTEN
     8385/java           
>> 
>> unix  2      [ ]         STREAM     CONNECTED     2344585 8385/java           
>> 
>> unix  2      [ ]         STREAM     CONNECTED     2344417 8385/java           
>> 
>> 
>> 
>> Cheers
>> 
>> 
>> 
>> 
>> 在 2015-10-29 03:22:48,"Jimmy Da" <jd462@cornell.edu <mailto:jd462@cornell.edu>>
写道:
>> So it seems that Tomcat server is up, but the pxf servlet has not started. To confirm
this, you can run "pxf-service status" to double check that pxf service is running.
>> 
>> One guess on what the problem is that the Java libraries were not loaded correctly.
I am looking at this line
>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log
>> 
>> Can you double check that you can find all the jar files at the locations in this
file?
>> https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath
<https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath>
>> 
>> Jimmy Da
>> That’s what people do, they leap, and hoping to God they can fly.
>> 
>> On Wed, Oct 28, 2015 at 12:03 PM, Ting(Goden) Yao <tyao@pivotal.io <mailto:tyao@pivotal.io>>
wrote:
>> Hi sequoiadb, 
>> 
>> which hawq/pxf version are you using (did you just compile the open source version
or it's former pivotal released hawq versions)?
>> 
>> Can you also attach pxf logs for investigation?
>> it's at var/log/gphd/
>> 
>> -Goden
>> 
>> On Wed, Oct 28, 2015 at 1:51 AM sequoiadb <mailing-list-recv@sequoiadb.com <mailto:mailing-list-recv@sequoiadb.com>>
wrote:
>> Hi guys,
>> 
>> I’m trying to setup PXF for HBase and got the following error:
>> tpch=# create external table hbase_member ( recordkey bytea, "address:city" varchar,
"address:contry" varchar, "address:province" varchar, "info:age" int, "info:birthday" varchar,
"info:company" varchar ) location ( 'pxf://cent61:50070/member?PROFILE=HBase' <>) FORMAT
'CUSTOM'( FORMATTER='pxfwritable_import');
>> CREATE EXTERNAL TABLE
>> tpch=# select * from hbase_member;
>> ERROR:  remote component error (0) from '192.168.31.205:51200 <http://192.168.31.205:51200/>':
couldn't connect to host (libchurl.c:852)
>> 
>> I could successfully create regular tables and perform queries, but when I try to
create pxf tables I’m keep getting error on connecting to port 51200.
>> 
>> So I tried to start pxf-service and got
>> [root@cent61 profile.d]# service pxf-service init
>> Creating instance 'pxf-service' ...
>>   Using separate layout
>>   Creating bin/setenv.sh
>>   Applying template 'base'
>>     Copying template's contents
>>     Applying fragment 'context-fragment.xml' to 'conf/context.xml'
>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>     Applying fragment 'web-fragment.xml' to 'conf/web.xml'
>>     Applying fragment 'tomcat-users-fragment.xml' to 'conf/tomcat-users.xml'
>>     Applying fragment 'catalina-fragment.properties' to 'conf/catalina.properties'
>>   Applying template 'base-tomcat-7'
>>     Copying template's contents
>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>     Applying fragment 'web-fragment.xml' to 'conf/web.xml'
>>     Applying fragment 'catalina-fragment.properties' to 'conf/catalina.properties'
>>   Applying template 'bio'
>>     Copying template's contents
>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>   Configuring instance 'pxf-service' to use Tomcat version 7.0.55.A.RELEASE
>>   Setting permissions
>> Instance created
>> Connector summary
>>   Port: 51200   Type: Blocking IO   Secure: false
>> [root@cent61 profile.d]# service pxf-service start
>> /var/gphd/pxf /
>> Creating home directory for pxf.
>> Using CATALINA_BASE:   /var/gphd/pxf/pxf-service
>> Using CATALINA_HOME:   /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
>> Using CATALINA_TMPDIR: /var/gphd/pxf/pxf-service/temp
>> Using JRE_HOME:        /usr/jdk64/jdk1.7.0_67
>> Using CLASSPATH:       /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar
>> Using CATALINA_PID:    /var/gphd/pxf/pxf-service/logs/tcserver.pid
>> Tomcat started.
>> Status:                RUNNING as PID=8385
>> /
>> Checking if tcServer is up and running...
>> tcServer not responding, re-trying after 1 second (attempt number 1)
>> tcServer not responding, re-trying after 1 second (attempt number 2)
>> Checking if PXF webapp is up and running...
>> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more information
>> 
>> Now the select statement showing another error:
>> tpch=# select * from base_member;
>> ERROR:  GPHD component not found (libchurl.c:1058)
>> 
>> Looks like hit this error:
>> bool handle_special_error(long response)
>> {
>> 	switch (response)
>> 	{
>> 		case 404:
>> 			elog(ERROR, "GPHD component not found");
>> 			break;
>> 		default:
>> 			return false;
>> 	}
>> 	return true;
>> }
>> 
>> Now do I need some sort of web service running, in order to make it work?
>> Is it caused by PXF web app was not able to run? Which log do I supposed to look?
>> catalina log showing this and I’m not sure if it’s the right one to look:
>> 29-Oct-2015 16:37:34.923 SEVERE [localhost-startStop-1] org.apache.catalina.core.ContainerBase.addChildInternal
ContainerBase.addChild: start: 
>>  org.apache.catalina.LifecycleException: Failed to start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]]
>> 	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
>> 	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
>> 	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>> 	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
>> 	at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083)
>> 	at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)
>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> 	at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.NoClassDefFoundError: Lorg/apache/commons/logging/Log;
>> 	at java.lang.Class.getDeclaredFields0(Native Method)
>> 	at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
>> 	at java.lang.Class.getDeclaredFields(Class.java:1806)
>> 	at org.apache.catalina.util.Introspection.getDeclaredFields(Introspection.java:106)
>> 	at org.apache.catalina.startup.WebAnnotationSet.loadFieldsAnnotation(WebAnnotationSet.java:270)
>> 	at org.apache.catalina.startup.WebAnnotationSet.loadApplicationListenerAnnotations(WebAnnotationSet.java:89)
>> 	at org.apache.catalina.startup.WebAnnotationSet.loadApplicationAnnotations(WebAnnotationSet.java:63)
>> 	at org.apache.catalina.startup.ContextConfig.applicationAnnotationsConfig(ContextConfig.java:403)
>> 	at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:879)
>> 	at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:374)
>> 	at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
>> 	at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
>> 	at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5378)
>> 	at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>> 	... 10 more
>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log
>> 	at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)
>> 	at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>> 	... 24 more
>> 
>> 29-Oct-2015 16:37:34.924 SEVERE [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployWAR
Error deploying web application archive /var/gphd/pxf/pxf-service/webapps/pxf.war
>>  java.lang.IllegalStateException: ContainerBase.addChild: start: org.apache.catalina.LifecycleException:
Failed to start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]]
>> 	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:904)
>> 	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>> 	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
>> 	at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083)
>> 	at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)
>> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> 	at java.lang.Thread.run(Thread.java:745)
>> 
>> I’m running on a previously built HDP 2.2.8 and performed manual HAWQ installation.
I got most parts done but stuck at PXF component, any help would be appreciate.
>> 
>> Thanks
>> 
>>  
>> 
>> 
> 
> 


Mime
View raw message