Return-Path: X-Original-To: apmail-hawq-user-archive@minotaur.apache.org Delivered-To: apmail-hawq-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E551317D07 for ; Fri, 30 Oct 2015 04:48:32 +0000 (UTC) Received: (qmail 75297 invoked by uid 500); 30 Oct 2015 04:48:32 -0000 Delivered-To: apmail-hawq-user-archive@hawq.apache.org Received: (qmail 75243 invoked by uid 500); 30 Oct 2015 04:48:32 -0000 Mailing-List: contact user-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hawq.incubator.apache.org Delivered-To: mailing list user@hawq.incubator.apache.org Received: (qmail 75233 invoked by uid 99); 30 Oct 2015 04:48:32 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Oct 2015 04:48:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2FC35C12B9 for ; Fri, 30 Oct 2015 04:48:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.012 X-Spam-Level: *** X-Spam-Status: No, score=3.012 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, T_FILL_THIS_FORM_SHORT=0.01, URIBL_BLOCKED=0.001, WEIRD_PORT=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id tYO0qoctMxS2 for ; Fri, 30 Oct 2015 04:48:19 +0000 (UTC) Received: from m97135.qiye.163.com (m97135.qiye.163.com [220.181.97.135]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 11A2342ABE for ; Fri, 30 Oct 2015 04:48:11 +0000 (UTC) Received: from [192.168.5.51] (unknown [218.17.39.141]) by smtp1 (Coremail) with SMTP id h+CowECJBUZt9jJWPFUZDA--.2495S2; Fri, 30 Oct 2015 12:47:58 +0800 (CST) From: sequoiadb Content-Type: multipart/alternative; boundary="Apple-Mail=_854D0EE3-4A75-4A68-85F5-60FE3F7C6C83" Message-Id: <81972296-04D5-4243-A2AD-78277CF9E58B@sequoiadb.com> Mime-Version: 1.0 (Mac OS X Mail 9.0 \(3094\)) Subject: Re: question about using PXF Date: Fri, 30 Oct 2015 12:47:40 +0800 References: <6F0328EB-A095-42B9-A40D-89C93395E575@sequoiadb.com> <94fd186.dc23.150b2515dcb.Coremail.mailing-list-recv@sequoiadb.com> <5121D85C-4ACB-42F8-AA88-82967AB66525@sequoiadb.com> To: user@hawq.incubator.apache.org In-Reply-To: X-Mailer: Apple Mail (2.3094) X-CM-TRANSID: h+CowECJBUZt9jJWPFUZDA--.2495S2 X-Coremail-Antispam: 1Uf129KBjvAXoW3ZFW3ZF4rWr1UGFW3ur4kWFg_yoW8Wry5Ko WSgF1UZ3WxGr9ruF18ta4kGrsxX3yq9rZ3GrZrAr45Ca4jqrWakFy5K3W7WF4fCF93tF93 Za48GasxCrZ2vFn3n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUjAwIDUUUU X-Originating-IP: [218.17.39.141] X-CM-SenderInfo: xpdlzxtqjnzxtvwn2v1fy62v5txrxt1geou0bp/1tbiMAKjhVVism0L2gAAst --Apple-Mail=_854D0EE3-4A75-4A68-85F5-60FE3F7C6C83 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=gb2312 Thanks Jimmy, That=A1=AFs very helpful explanation. It looks like backend/access/external and bin/gpfusion are the main code = where PXF request being sent. I=A1=AFm proposing to create an interface = calling our C API if the given URI indicating the data source is located = in our own system. It seems we should overwrite pxfwritable_export and pxfwritable_import = interface, is it correct? Thanks > =D4=DA 2015=C4=EA10=D4=C230=C8=D5=A3=AC=CF=C2=CE=E712:01=A3=ACJimmy Da = =D0=B4=B5=C0=A3=BA >=20 > Great job on linking the right classpaths! >=20 > In terms of resource consumption, pxf daemon shouldn't use more than = 1GB and then some (off-heap stack/memory). C.f. > https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10 = >=20 > In terms of performance, the slow down is unavoidable compared with = hbase shell as the two go through different paths to retrieve the data. >=20 > In hbase shell, the client talk with HBaseMaster and RegionServer and = get data in an optimal way where the data could even be warm in the = HFile cache (in memory store). >=20 > With PXF, the Java daemon read off the hdfs location in your CREATE = EXTERNAL TABLE definition, talk to NAMENODE to find out the block = locations containing the HFile (on disk), and then use the HBase java = file reader to read the data with some serde, and then send the results = to the local HAWQ segments, where query processing will happen. >=20 > PXF is built in a way that it generalizes data access to different = systems (the previous paragraph could also apply to reading HDFS files, = Hive files, name-your-own-system). The additional overhead mostly come = from retrieving the initial metadata. I suppose it would be an = interesting experiment to run when dealing with larger data set and see = if the performance difference is additive or multiplicative. >=20 > Noa correct me if I made a mistake :) >=20 > Jimmy Da > That=A1=AFs what people do, they leap, and hoping to God they can fly. >=20 > On Thu, Oct 29, 2015 at 6:49 PM, sequoiadb = > wrote: > Creating soft link from /usr/phd to /usr/hdp makes pxf-service start = successfully. >=20 > Just curious what=A1=AFs the overhead of using PXF? >=20 > postgres=3D# select * from hbase_member; > recordkey | address:city | address:contry | address:province | = info:age | info > :birthday | info:company=20 > = ------------+--------------+----------------+------------------+----------= +----- > ----------+-------------- > scutshuxue | hangzhou | china | zhejiang | = 99 | 1987 > -06-17 | alibaba > xiaofeng | jieyang | china | guangdong | = | 1987 > -4-17 | alibaba > (2 rows) >=20 > Time: 434.412 ms >=20 > hbase(main):004:0* scan 'member' > ROW COLUMN+CELL = =20 > scutshuxue column=3Daddress:city, = timestamp=3D1446104911726, value=3Dhangzhou = =20 > scutshuxue column=3Daddress:contry, = timestamp=3D1446104910743, value=3Dchina = =20 > scutshuxue column=3Daddress:province, = timestamp=3D1446104910775, value=3Dzhejiang = =20 > scutshuxue column=3Dinfo:age, = timestamp=3D1446104987420, value=3D99 = =20 > scutshuxue column=3Dinfo:birthday, = timestamp=3D1446104910674, value=3D1987-06-17 = =20 > scutshuxue column=3Dinfo:company, = timestamp=3D1446104910715, value=3Dalibaba = =20 > xiaofeng column=3Daddress:city, = timestamp=3D1446104920523, value=3Djieyang = =20 > xiaofeng column=3Daddress:contry, = timestamp=3D1446104920461, value=3Dchina = =20 > xiaofeng column=3Daddress:province, = timestamp=3D1446104920486, value=3Dguangdong = =20 > xiaofeng column=3Daddress:town, = timestamp=3D1446104921802, value=3Dxianqiao = =20 > xiaofeng column=3Dinfo:birthday, = timestamp=3D1446104920358, value=3D1987-4-17 = =20 > xiaofeng column=3Dinfo:company, = timestamp=3D1446104920423, value=3Dalibaba = =20 > xiaofeng column=3Dinfo:favorite, = timestamp=3D1446104920397, value=3Dmovie = =20 > 2 row(s) in 0.0540 seconds >=20 > It=A1=AFs very slow comparing running in hbase shell. >=20 >=20 >> =D4=DA 2015=C4=EA10=D4=C229=C8=D5=A3=AC=CF=C2=CE=E78:33=A3=ACNoa Horn = > =D0=B4=B5=C0=A3=BA >>=20 >> The problem is probably because the jars that are required by PXF are = not found. >>=20 >> In the attached log file, this error for example shows that = hadoop-auth.jar is not found: >> 29-Oct-2015 16:37:33.405 WARNING [localhost-startStop-1] = com.pivotal.pxf.service.utilities.CustomWebappLoader.addRepositories = Failed to load entry /usr/phd/current/hadoop-client/hadoop-auth.jar: = java.nio.file.NoSuchFileException: /usr/phd/current/hadoop-client >>=20 >> Have a look at /etc/conf/gphd/pxf (old version) or /etc/conf/pxf = (open source version), at the file pxf-private.classpath. >> Every source specified there is required by PXF. >> The default paths for these resources is under /usr/phd/... (Pivotal = distribution) while your system is hdp so the path is different. = Luckily, we also provide the paths for hdp distribution - in = pxf-privatehdp.classpath. If you copy the content of that file into = pxf-private.classpath and run init and start again, it should work. >>=20 >> As an aside, it's highly recommended to compile and use the open = source version, because we made a few changes in the rpms. >> =46rom the pxf directory, run 'make tomcat' to generate a tomcat rpm = (required by PXF) and 'make rpm' to compile and create PXF rpms. >>=20 >> Noa >>=20 >>=20 >> On Wed, Oct 28, 2015 at 11:38 PM, mailing-list-recv = > wrote: >> Thanks guys, >>=20 >> Not sure if mailing list supports attachment, let me try anyway. >>=20 >> Status command shows following: >> [root@cent61 ~]# service pxf-service status >>=20 >> Checking if tcServer is up and running... >>=20 >> Checking if PXF webapp is up and running... >>=20 >> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for = more information >>=20 >> I was using the binary version downloaded from the site. I haven't = tried to compile from open source yet. >>=20 >> The port 51200 is opened >> [root@cent61 logs]# cat tcserver.pid >>=20 >> 8385 >>=20 >> [root@cent61 logs]# ps -elf | grep 8385 >>=20 >> 0 S pxf 8385 1 0 80 0 - 312017 futex_ Oct29 ? = 00:00:40 /usr/jdk64/jdk1.7.0_67/bin/java = -Djava.util.logging.config.file=3D/var/gphd/pxf/pxf-service/conf/logging.p= roperties = -Djava.util.logging.manager=3Dcom.springsource.tcserver.serviceability.log= ging.TcServerLogManager -Xmx512M -Xss256K = -Djava.endorsed.dirs=3D/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.5= 5.A.RELEASE/endorsed -classpath = /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstr= ap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/= tomcat-juli.jar -Dcatalina.base=3D/var/gphd/pxf/pxf-service = -Dcatalina.home=3D/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.R= ELEASE -Djava.io.tmpdir=3D/var/gphd/pxf/pxf-service/temp = org.apache.catalina.startup.Bootstrap start >>=20 >> 4 S root 23247 22386 0 80 0 - 25813 pipe_w 14:35 pts/2 = 00:00:00 grep 8385 >>=20 >> [root@cent61 logs]# netstat -anp | grep 8385 >>=20 >> tcp 0 0 ::ffff:127.0.0.1:6969 = :::* LISTEN 8385/java =20 >>=20 >> tcp 0 0 :::51200 :::* = LISTEN 8385/java =20 >>=20 >> unix 2 [ ] STREAM CONNECTED 2344585 8385/java = =20 >>=20 >> unix 2 [ ] STREAM CONNECTED 2344417 8385/java = =20 >>=20 >>=20 >>=20 >> Cheers >>=20 >>=20 >>=20 >>=20 >> =D4=DA 2015-10-29 03:22:48=A3=AC"Jimmy Da" > =D0=B4=B5=C0=A3=BA >> So it seems that Tomcat server is up, but the pxf servlet has not = started. To confirm this, you can run "pxf-service status" to double = check that pxf service is running. >>=20 >> One guess on what the problem is that the Java libraries were not = loaded correctly. I am looking at this line >> Caused by: java.lang.ClassNotFoundException: = org.apache.commons.logging.Log >>=20 >> Can you double check that you can find all the jar files at the = locations in this file? >> = https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/m= ain/resources/pxf-privatehdp.classpath = >>=20 >> Jimmy Da >> That=A1=AFs what people do, they leap, and hoping to God they can = fly. >>=20 >> On Wed, Oct 28, 2015 at 12:03 PM, Ting(Goden) Yao > wrote: >> Hi sequoiadb,=20 >>=20 >> which hawq/pxf version are you using (did you just compile the open = source version or it's former pivotal released hawq versions)? >>=20 >> Can you also attach pxf logs for investigation? >> it's at var/log/gphd/ >>=20 >> -Goden >>=20 >> On Wed, Oct 28, 2015 at 1:51 AM sequoiadb = > wrote: >> Hi guys, >>=20 >> I=A1=AFm trying to setup PXF for HBase and got the following error: >> tpch=3D# create external table hbase_member ( recordkey bytea, = "address:city" varchar, "address:contry" varchar, "address:province" = varchar, "info:age" int, "info:birthday" varchar, "info:company" varchar = ) location ( 'pxf://cent61:50070/member?PROFILE=3DHBase' <>) FORMAT = 'CUSTOM'( FORMATTER=3D'pxfwritable_import'); >> CREATE EXTERNAL TABLE >> tpch=3D# select * from hbase_member; >> ERROR: remote component error (0) from '192.168.31.205:51200 = ': couldn't connect to host = (libchurl.c:852) >>=20 >> I could successfully create regular tables and perform queries, but = when I try to create pxf tables I=A1=AFm keep getting error on = connecting to port 51200. >>=20 >> So I tried to start pxf-service and got >> [root@cent61 profile.d]# service pxf-service init >> Creating instance 'pxf-service' ... >> Using separate layout >> Creating bin/setenv.sh >> Applying template 'base' >> Copying template's contents >> Applying fragment 'context-fragment.xml' to 'conf/context.xml' >> Applying fragment 'server-fragment.xml' to 'conf/server.xml' >> Applying fragment 'web-fragment.xml' to 'conf/web.xml' >> Applying fragment 'tomcat-users-fragment.xml' to = 'conf/tomcat-users.xml' >> Applying fragment 'catalina-fragment.properties' to = 'conf/catalina.properties' >> Applying template 'base-tomcat-7' >> Copying template's contents >> Applying fragment 'server-fragment.xml' to 'conf/server.xml' >> Applying fragment 'web-fragment.xml' to 'conf/web.xml' >> Applying fragment 'catalina-fragment.properties' to = 'conf/catalina.properties' >> Applying template 'bio' >> Copying template's contents >> Applying fragment 'server-fragment.xml' to 'conf/server.xml' >> Configuring instance 'pxf-service' to use Tomcat version = 7.0.55.A.RELEASE >> Setting permissions >> Instance created >> Connector summary >> Port: 51200 Type: Blocking IO Secure: false >> [root@cent61 profile.d]# service pxf-service start >> /var/gphd/pxf / >> Creating home directory for pxf. >> Using CATALINA_BASE: /var/gphd/pxf/pxf-service >> Using CATALINA_HOME: = /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE >> Using CATALINA_TMPDIR: /var/gphd/pxf/pxf-service/temp >> Using JRE_HOME: /usr/jdk64/jdk1.7.0_67 >> Using CLASSPATH: = /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstr= ap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/= tomcat-juli.jar >> Using CATALINA_PID: /var/gphd/pxf/pxf-service/logs/tcserver.pid >> Tomcat started. >> Status: RUNNING as PID=3D8385 >> / >> Checking if tcServer is up and running... >> tcServer not responding, re-trying after 1 second (attempt number 1) >> tcServer not responding, re-trying after 1 second (attempt number 2) >> Checking if PXF webapp is up and running... >> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for = more information >>=20 >> Now the select statement showing another error: >> tpch=3D# select * from base_member; >> ERROR: GPHD component not found (libchurl.c:1058) >>=20 >> Looks like hit this error: >> bool handle_special_error(long response) >> { >> switch (response) >> { >> case 404: >> elog(ERROR, "GPHD component not found"); >> break; >> default: >> return false; >> } >> return true; >> } >>=20 >> Now do I need some sort of web service running, in order to make it = work? >> Is it caused by PXF web app was not able to run? Which log do I = supposed to look? >> catalina log showing this and I=A1=AFm not sure if it=A1=AFs the = right one to look: >> 29-Oct-2015 16:37:34.923 SEVERE [localhost-startStop-1] = org.apache.catalina.core.ContainerBase.addChildInternal = ContainerBase.addChild: start:=20 >> org.apache.catalina.LifecycleException: Failed to start component = [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]] >> at = org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154) >> at = org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java= :901) >> at = org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) >> at = org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649) >> at = org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083) >> at = org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)= >> at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145) >> at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.NoClassDefFoundError: = Lorg/apache/commons/logging/Log; >> at java.lang.Class.getDeclaredFields0(Native Method) >> at java.lang.Class.privateGetDeclaredFields(Class.java:2436) >> at java.lang.Class.getDeclaredFields(Class.java:1806) >> at = org.apache.catalina.util.Introspection.getDeclaredFields(Introspection.jav= a:106) >> at = org.apache.catalina.startup.WebAnnotationSet.loadFieldsAnnotation(WebAnnot= ationSet.java:270) >> at = org.apache.catalina.startup.WebAnnotationSet.loadApplicationListenerAnnota= tions(WebAnnotationSet.java:89) >> at = org.apache.catalina.startup.WebAnnotationSet.loadApplicationAnnotations(We= bAnnotationSet.java:63) >> at = org.apache.catalina.startup.ContextConfig.applicationAnnotationsConfig(Con= textConfig.java:403) >> at = org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.jav= a:879) >> at = org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.jav= a:374) >> at = org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupp= ort.java:117) >> at = org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.ja= va:90) >> at = org.apache.catalina.core.StandardContext.startInternal(StandardContext.jav= a:5378) >> at = org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) >> ... 10 more >> Caused by: java.lang.ClassNotFoundException: = org.apache.commons.logging.Log >> at = org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j= ava:1720) >> at = org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.j= ava:1571) >> ... 24 more >>=20 >> 29-Oct-2015 16:37:34.924 SEVERE [localhost-startStop-1] = org.apache.catalina.startup.HostConfig.deployWAR Error deploying web = application archive /var/gphd/pxf/pxf-service/webapps/pxf.war >> java.lang.IllegalStateException: ContainerBase.addChild: start: = org.apache.catalina.LifecycleException: Failed to start component = [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]] >> at = org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java= :904) >> at = org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) >> at = org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649) >> at = org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083) >> at = org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)= >> at = java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at = java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:= 1145) >> at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :615) >> at java.lang.Thread.run(Thread.java:745) >>=20 >> I=A1=AFm running on a previously built HDP 2.2.8 and performed manual = HAWQ installation. I got most parts done but stuck at PXF component, any = help would be appreciate. >>=20 >> Thanks >>=20 >> =20 >>=20 >>=20 >=20 >=20 --Apple-Mail=_854D0EE3-4A75-4A68-85F5-60FE3F7C6C83 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=gb2312 Thanks Jimmy,

That=A1=AFs very helpful explanation.

It looks like backend/access/external = and bin/gpfusion are the main code where PXF request being sent. I=A1=AFm = proposing to create an interface calling our C API if the given URI = indicating the data source is located in our own system.

It seems we should = overwrite pxfwritable_export and pxfwritable_import interface, is it = correct?

Thanks

=D4=DA = 2015=C4=EA10=D4=C230=C8=D5=A3=AC=CF=C2=CE=E712:01=A3=ACJimmy Da <jd462@cornell.edu> = =D0=B4=B5=C0=A3=BA

Great job on = linking the right classpaths!

In= terms of resource consumption, pxf daemon shouldn't use more than 1GB = and then some (off-heap stack/memory). C.f.
https://github.com/apache/incubator-hawq/blob/master/pxf/gradle= w#L10

In terms of performance, the slow down is unavoidable = compared with hbase shell as the two go through different paths to = retrieve the data.

In hbase shell, the client talk with HBaseMaster and = RegionServer and get data in an optimal way where the data could even be = warm in the HFile cache (in memory store).

With PXF, the Java daemon read off the = hdfs location in your CREATE EXTERNAL TABLE definition, talk to NAMENODE = to find out the block locations containing the HFile (on disk), and then = use the HBase java file reader to read the data with some serde, and = then send the results to the local HAWQ segments, where query processing = will happen.

PXF= is built in a way that it generalizes data access to different systems = (the previous paragraph could also apply to reading HDFS files, Hive = files, name-your-own-system). The additional overhead mostly come from = retrieving the initial metadata. I suppose it would be an interesting = experiment to run when dealing with larger data set and see if the = performance difference is additive or multiplicative.

Noa correct me if I made = a mistake :)

Jimmy Da
That=A1=AF= s what people do, they leap, and hoping to God they can = fly.

On Thu, Oct 29, 2015 at 6:49 = PM, sequoiadb <mailing-list-recv@sequoiadb.com> wrote:
Creating soft link from = /usr/phd to /usr/hdp makes pxf-service start successfully.

Just curious what=A1=AFs = the overhead of using PXF?

postgres=3D# select * from hbase_member;
 recordkey  | address:city | address:contry = | address:province | info:age | info
:birthday | = info:company 
------------+--------------+----------------+------------------= +----------+-----
----------+--------------
 scutshuxue | hangzhou     | = china          | zhejiang   =       |       99 | 1987
-06-17    | alibaba
 xiaofeng   | jieyang    =   | china          | = guangdong        |        =   | 1987
-4-17     | = alibaba
(2 rows)

Time: = 434.412 ms

hbase(main):004:0* scan 'member'
ROW   =                     =         COLUMN+CELL       =                     =                     =                     =                   
 scutshuxue           =           =   column=3Daddress:city, timestamp=3D1446104911726, = value=3Dhangzhou                =                     =  
 scutshuxue         =             =   column=3Daddress:contry, timestamp=3D1446104910743, = value=3Dchina               =                     =    
 scutshuxue       =               =   column=3Daddress:province, = timestamp=3D1446104910775, value=3Dzhejiang      =                     =        
 scutshuxue   =                   =   column=3Dinfo:age, timestamp=3D1446104987420, = value=3D99                  =                     =          
 scutshuxue           =           =   column=3Dinfo:birthday, timestamp=3D1446104910674, = value=3D1987-06-17               =                   
 scutshuxue           =           =   column=3Dinfo:company, timestamp=3D1446104910715, = value=3Dalibaba               =                     =    
 xiaofeng       =                 =   column=3Daddress:city, timestamp=3D1446104920523, = value=3Djieyang               =                     =    
 xiaofeng       =                 =   column=3Daddress:contry, timestamp=3D1446104920461, = value=3Dchina               =                     =    
 xiaofeng       =                 =   column=3Daddress:province, = timestamp=3D1446104920486, value=3Dguangdong     =                     =        
 xiaofeng   =                     =   column=3Daddress:town, timestamp=3D1446104921802, = value=3Dxianqiao                =                     =  
 xiaofeng         =               =   column=3Dinfo:birthday, timestamp=3D1446104920358, = value=3D1987-4-17                =                   
 xiaofeng             =           =   column=3Dinfo:company, timestamp=3D1446104920423, = value=3Dalibaba               =                     =    
 xiaofeng       =                 =   column=3Dinfo:favorite, timestamp=3D1446104920397, = value=3Dmovie                =                     =   
2 row(s) in 0.0540 seconds

It=A1=AFs very slow comparing running = in hbase shell.

=D4=DA = 2015=C4=EA10=D4=C229=C8=D5=A3=AC=CF=C2=CE=E78:33=A3=ACNoa Horn <nhorn@pivotal.io> =D0=B4=B5=C0=A3=BA

The problem is probably = because the jars that are required by PXF are not found.
In the attached log file, this error for example shows that = hadoop-auth.jar is not found:
29-Oct-2015 16:37:33.405 = WARNING [localhost-startStop-1]=20 com.pivotal.pxf.service.utilities.CustomWebappLoader.addRepositories=20 Failed to load entry /usr/phd/current/hadoop-client/hadoop-auth.jar:=20 java.nio.file.NoSuchFileException: /usr/phd/current/hadoop-client

Have a look at /etc/conf/gphd/pxf (old = version) or /etc/conf/pxf (open source version), at the file = pxf-private.classpath.
Every source specified there = is required by PXF.
The default paths for these = resources is under /usr/phd/... (Pivotal distribution) while your system = is hdp so the path is different. Luckily, we also provide the paths for = hdp distribution - in pxf-privatehdp.classpath. If you copy the content = of that file into pxf-private.classpath and run init and start again, it = should work.

As an aside, it's = highly recommended to compile and use the open source version, because = we made a few changes in the rpms.
=46rom the pxf = directory, run 'make tomcat' to generate a tomcat rpm (required by PXF) = and 'make rpm' to compile and create PXF rpms.

Noa


On Wed, Oct 28, 2015 at 11:38 PM, = mailing-list-recv <mailing-list-recv@sequoiadb.com> wrote:
Thanks guys,

Not sure if mailing list supports attachment, let me try = anyway.

Status = command shows following:

[root@cent61 ~]# service pxf-service status

Checking if tcServer is up and = running...

Checking if PXF webapp is up and = running...

ERROR: PXF webapp is inaccessible but tcServer is up. Check = logs for more information

I was using the binary version downloaded from the site. I = haven't tried to compile from open source yet.

The port 51200 is opened

[root@cent61 logs]# cat = tcserver.pid

8385

[root@cent61 logs]# ps -elf | grep = 8385

0 S pxf     =   8385     1  0  80   0 - 312017 futex_ = Oct29 ?       00:00:40 /usr/jdk64/jdk1.7.0_67/bin/java = -Djava.util.logging.config.file=3D/var/gphd/pxf/pxf-service/conf/logging.p= roperties = -Djava.util.logging.manager=3Dcom.springsource.tcserver.serviceability.log= ging.TcServerLogManager -Xmx512M -Xss256K = -Djava.endorsed.dirs=3D/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.5= 5.A.RELEASE/endorsed -classpath = /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstr= ap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/= tomcat-juli.jar -Dcatalina.base=3D/var/gphd/pxf/pxf-service = -Dcatalina.home=3D/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.R= ELEASE -Djava.io.tmpdir=3D/var/gphd/pxf/pxf-service/temp = org.apache.catalina.startup.Bootstrap start

4 S root     23247 22386  0  80   0 = - 25813 pipe_w 14:35 pts/2    00:00:00 grep 8385

[root@cent61 logs]# netstat -anp | grep = 8385

tcp      =   0      0 ::ffff:127.0.0.1:6969       = :::*                  =       LISTEN      8385/java    =        

tcp        0      0 = :::51200                  =   :::*                =         LISTEN      8385/java  =          

unix  2      [ ]       =   STREAM     CONNECTED     2344585 = 8385/java           

unix  2      [ ]   =       STREAM     CONNECTED     = 2344417 8385/java           


Cheers




=D4=DA 2015-10-29 03:22:48=A3=AC"Jimmy Da" <jd462@cornell.edu> =D0=B4=B5=C0=A3=BA
=
So it seems that Tomcat server is up, but the pxf servlet has = not started. To confirm this, you can run "pxf-service status" to double = check that pxf service is running.

One guess on what the problem is that = the Java libraries were not loaded correctly. I am looking at this = line
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log

Can you double check that you can find = all the jar files at the locations in this file?

Jimmy Da
That=A1=AF= s what people do, they leap, and hoping to God they can = fly.

On Wed, Oct 28, 2015 at 12:03 = PM, Ting(Goden) Yao <tyao@pivotal.io> wrote:
Hi = sequoiadb, 

which= hawq/pxf version are you using (did you just compile the open source = version or it's former pivotal released hawq versions)?

Can you also attach pxf = logs for investigation?
it's at var/log/gphd/

-Goden

On = Wed, Oct 28, 2015 at 1:51 AM sequoiadb <mailing-list-recv@sequoiadb.com> wrote:
Hi guys,

I=A1=AFm trying to setup PXF for HBase = and got the following error:
tpch=3D# create = external table hbase_member ( recordkey bytea, "address:city" = varchar, "address:contry" varchar, "address:province" varchar, = "info:age" int, "info:birthday" varchar, "info:company" varchar ) = location ( 'pxf://cent61:50070/member?PROFILE=3DHBase') = FORMAT 'CUSTOM'( FORMATTER=3D'pxfwritable_import');
CREATE EXTERNAL TABLE
tpch=3D# select * from = hbase_member;
ERROR:  remote component error (0) = from '192.168.31.205:51200': couldn't connect to = host (libchurl.c:852)

I could successfully create regular tables and perform = queries, but when I try to create pxf tables I=A1=AFm keep getting error = on connecting to port 51200.

So I tried to start pxf-service and = got
[root@cent61 profile.d]# service pxf-service = init
Creating instance 'pxf-service' ...
  Using separate layout
  Creating bin/setenv.sh
  Applying template 'base'
  =   Copying template's contents
  =   Applying fragment 'context-fragment.xml' = to 'conf/context.xml'
    Applying = fragment 'server-fragment.xml' to 'conf/server.xml'
    Applying fragment 'web-fragment.xml' = to 'conf/web.xml'
    Applying = fragment 'tomcat-users-fragment.xml' to 'conf/tomcat-users.xml'
    Applying fragment = 'catalina-fragment.properties' to 'conf/catalina.properties'
  Applying template 'base-tomcat-7'
    Copying template's contents
    Applying fragment 'server-fragment.xml' = to 'conf/server.xml'
    Applying = fragment 'web-fragment.xml' to 'conf/web.xml'
  =   Applying fragment 'catalina-fragment.properties' = to 'conf/catalina.properties'
  Applying = template 'bio'
    Copying template's = contents
    Applying fragment = 'server-fragment.xml' to 'conf/server.xml'
  Configuring instance 'pxf-service' to use = Tomcat version 7.0.55.A.RELEASE
  Setting = permissions
Instance created
Connector = summary
  Port: 51200   Type: = Blocking IO   Secure: false
[root@cent61 = profile.d]# service pxf-service start
/var/gphd/pxf /
Creating home directory for pxf.
Using = CATALINA_BASE:   /var/gphd/pxf/pxf-service
Using = CATALINA_HOME:   /opt/vmware/vfabric-tc-server-standard/tom= cat-7.0.55.A.RELEASE
Using CATALINA_TMPDIR: = /var/gphd/pxf/pxf-service/temp
Using JRE_HOME:  =       /usr/jdk64/jdk1.7.0_67
Using = CLASSPATH:     =   /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE= /bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.= RELEASE/bin/tomcat-juli.jar
Using CATALINA_PID:  =   /var/gphd/pxf/pxf-service/logs/tcserver.pid
Tomcat started.
Status:      =           RUNNING as PID=3D8385
/
Checking if tcServer is up and running...
tcServer not responding, re-trying after 1 = second (attempt number 1)
tcServer not responding, = re-trying after 1 second (attempt number 2)
Checking = if PXF webapp is up and running...
ERROR: PXF webapp is inaccessible but tcServer is = up. Check logs for more information

Now the select statement showing = another error:
tpch=3D# select * from = base_member;
ERROR:  GPHD component not = found (libchurl.c:1058)

Looks like hit this error:
bool = handle_special_error(long = response)
{
= switch = (response)
= {
= case 404:
= elog(ERROR, "GPHD component not found");
= break;
= default:
= return false;
= }
= return true;
}

Now do I need some sort of web service running, in order to = make it work?
Is it caused by PXF web app was not able to run? Which log do = I supposed to look?
catalina log showing this and I=A1=AFm not sure if it=A1=AFs = the right one to look:
29-Oct-2015 16:37:34.923 SEVERE = [localhost-startStop-1] org.apache.catalina.core.ContainerBase.addChi= ldInternal ContainerBase.addChild: start: 
 org.apache.catalina.LifecycleException: Failed to = start component [StandardEngine[Catalina].StandardHost[localhost= ].StandardContext[/pxf]]
= at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.= java:154)
= at org.apache.catalina.core.ContainerBase.addChildInternal(Con= tainerBase.java:901)
= at org.apache.catalina.core.ContainerBase.addChild(ContainerBa= se.java:877)
= at org.apache.catalina.core.StandardHost.addChild(StandardHost= .java:649)
= at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig= .java:1083)
= = at org.apache.catalina.startup.HostConfig$DeployWar.run(HostCo= nfig.java:1880)
= at java.util.concurrent.Executors$RunnableAdapter.call(Executo= rs.java:471)
= at java.util.concurrent.FutureTask.run(FutureTask.java:262) = at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoo= lExecutor.java:1145)
= at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo= olExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: = java.lang.NoClassDefFoundError: Lorg/apache/commons/logging/Log;
= at java.lang.Class.getDeclaredFields0(Native Method)
= at java.lang.Class.privateGetDeclaredFields(Class.java:2436) = at java.lang.Class.getDeclaredFields(Class.java:1806)
= at org.apache.catalina.util.Introspection.getDeclaredFields(In= trospection.java:106)
= at org.apache.catalina.startup.WebAnnotationSet.loadFieldsAnno= tation(WebAnnotationSet.java:270)
= at org.apache.catalina.startup.WebAnnotationSet.loadApplicatio= nListenerAnnotations(WebAnnotationSet.java:89)
= at org.apache.catalina.startup.WebAnnotationSet.loadApplicatio= nAnnotations(WebAnnotationSet.java:63)
= at org.apache.catalina.startup.ContextConfig.applicationAnnota= tionsConfig(ContextConfig.java:403)
= at org.apache.catalina.startup.ContextConfig.configureStart(Co= ntextConfig.java:879)
= at org.apache.catalina.startup.ContextConfig.lifecycleEvent(Co= ntextConfig.java:374)
= at org.apache.catalina.util.LifecycleSupport.fireLifecycleEven= t(LifecycleSupport.java:117)
= at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(L= ifecycleBase.java:90)
= at org.apache.catalina.core.StandardContext.startInternal(Stan= dardContext.java:5378)
= at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.= java:150)
= ... 10 more
Caused by: = java.lang.ClassNotFoundException: org.apache.commons.logging.Log
= at org.apache.catalina.loader.WebappClassLoader.loadClass(Weba= ppClassLoader.java:1720)
= at org.apache.catalina.loader.WebappClassLoader.loadClass(Weba= ppClassLoader.java:1571)
... 24 more

29-Oct-2015 16:37:34.924 SEVERE = [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deploy= WAR Error deploying web application archive = /var/gphd/pxf/pxf-service/webapps/pxf.war
 java.lang.IllegalStateException: ContainerBase.addCh= ild: start: org.apache.catalina.LifecycleException: Failed to = start component [StandardEngine[Catalina].StandardHost[localhost= ].StandardContext[/pxf]]
= at org.apache.catalina.core.ContainerBase.addChildInternal(Con= tainerBase.java:904)
= at org.apache.catalina.core.ContainerBase.addChild(ContainerBa= se.java:877)
= at org.apache.catalina.core.StandardHost.addChild(StandardHost= .java:649)
= at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig= .java:1083)
= = at org.apache.catalina.startup.HostConfig$DeployWar.run(HostCo= nfig.java:1880)
= at java.util.concurrent.Executors$RunnableAdapter.call(Executo= rs.java:471)
= at java.util.concurrent.FutureTask.run(FutureTask.java:262) = at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoo= lExecutor.java:1145)
= at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPo= olExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I=A1=AFm running on a = previously built HDP 2.2.8 and performed manual HAWQ installation. I got = most parts done but stuck at PXF component, any help would be = appreciate.

Thanks

 





= --Apple-Mail=_854D0EE3-4A75-4A68-85F5-60FE3F7C6C83--