hawq-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noa Horn <nh...@pivotal.io>
Subject Re: Re: Failed to write into WRITABLE EXTERNAL TABLE
Date Mon, 09 Nov 2015 18:24:00 GMT
The file name is constructed by hawq segment, each one with its own unique
id:
Check out build_file_name_for_write() in src/bin/gpfusion/gpbridgeapi.c


On Fri, Nov 6, 2015 at 5:03 PM, hawqstudy <hawqstudy@163.com> wrote:

>
> Thanks Noa,
>
> So is it safe to assume it's always append a slash at beginning, and
> followed by a slash and other stuff?
> Can you show me the code where it's construct the path? I couldn't find it
> in order to confirm the logic.
>
> I used the following code to extract data source and I worked in my
> environment. Just not sure if the getDataSource() always return the format
> like that.
>
>       StringTokenizer st = new StringTokenizer ( wds, "/", false ) ;
>
>       if ( st.countTokens() == 0 ) {
>
>          throw new RuntimeException ( "Invalid data source: " + wds ) ;
>
>       }
>
>       return st.nextToken () ;
>
>
>
>
> At 2015-11-07 02:36:26, "Noa Horn" <nhorn@pivotal.io> wrote:
>
> Hi,
>
> 1. Regarding the permissions issue - PXF is running as pxf user. So any
> operation on Hadoop needs to be done on files or directories which allow
> pxf user to read/write.
> You mentioned changing pxf user to be part of hdfs, but I am not sure it
> was necessary. The PXF RPM already adds pxf user to hadoop group.
>
> 2. Regarding writable tables. The way to use them is to define a
> *directory* where the data will be written. When the SQL executes, each
> segment writes its own data to the same directory, as defined in the
> external table, but in a separate file. That's why the setDataSource() is
> needed when writing, because each segments creates its own unique file
> name. The changes you saw in the path is expected, it should be
> "<directory>/<unique_file_name>".
>
> Regards,
> Noa
>
>
> On Fri, Nov 6, 2015 at 12:11 AM, hawqstudy <hawqstudy@163.com> wrote:
>
>>
>> Tried to set pxf user to hdfs in /etc/init.d/pxf-service and fix file
>> owners for several dirs.
>> Now I got problem that the getDataSource() returns something strange.
>> My DDL is:
>>
>> pxf://localhost:51200/foo.main?PROFILE=XXXX
>> In Read Accessor, getDataSource successfully get foo.main as the data
>> source name.
>> However in Write Accessor, InputData.getDataSource() call
>> shows /foo.main/1365_0
>> By tracking back the code I found pxf.service.WriteBridge.stream has:
>>
>>     public Response stream(@Context final ServletContext servletContext,
>>
>>                            @Context HttpHeaders headers,
>>
>>                            @QueryParam("path") String path,
>>
>>                            InputStream inputStream) throws Exception {
>>
>>
>>         /* Convert headers into a case-insensitive regular map */
>>
>>         Map<String, String> params =
>> convertToCaseInsensitiveMap(headers.getRequestHeaders());
>>
>>         if (LOG.isDebugEnabled()) {
>>
>>             LOG.debug("WritableResource started with parameters: " +
>> params + " and write path: " + path);
>>
>>         }
>>
>>
>> *        ProtocolData protData = **new** ProtocolData(params);*
>>
>> *        protData.setDataSource(path);*
>>
>>
>>
>>         SecuredHDFS.verifyToken(protData, servletContext);
>>
>>         Bridge bridge = new WriteBridge(protData);
>>
>>
>>         // THREAD-SAFE parameter has precedence
>>
>>         boolean isThreadSafe = protData.isThreadSafe() &&
>> bridge.isThreadSafe();
>>
>>         LOG.debug("Request for " + path + " handled " +
>>
>>                 (isThreadSafe ? "without" : "with") + " synchronization"
>> );
>>
>>
>>         return isThreadSafe ?
>>
>>                 writeResponse(bridge, path, inputStream) :
>>
>>                 synchronizedWriteResponse(bridge, path, inputStream);
>>
>>     }
>> The highlighted *protData.setDataSource(path); *set the data source from
>> the expected one into the strange one.
>> So I keep looking for where the path is from, jdb shows
>> tomcat-http--18[1] print path
>>  path = "/foo.main/1365_0"
>> tomcat-http--18[1] where
>>   [1] com.pivotal.pxf.service.rest.WritableResource.stream
>> (WritableResource.java:102)
>>   [2] sun.reflect.NativeMethodAccessorImpl.invoke0 (本机方法)
>>   [3] sun.reflect.NativeMethodAccessorImpl.invoke
>> (NativeMethodAccessorImpl.java:57)
>> ...
>>
>> tomcat-http--18[1] print params
>>
>>  params = "{accept=*/*, content-type=application/octet-stream,
>> expect=100-continue, host=127.0.0.1:51200, transfer-encoding=chunked,
>> X-GP-ACCESSOR=com.xxxx.pxf.plugins.xxxx.XXXXAccessor, x-gp-alignment=8,
>> x-gp-attr-name0=id, x-gp-attr-name1=total, x-gp-attr-name2=comments,
>> x-gp-attr-typecode0=23, x-gp-attr-typecode1=23, x-gp-attr-typecode2=1043,
>> x-gp-attr-typename0=int4, x-gp-attr-typename1=int4,
>> x-gp-attr-typename2=varchar, x-gp-attrs=3, x-gp-data-dir=foo.main,
>> x-gp-format=GPDBWritable,
>> X-GP-FRAGMENTER=com.xxxx.pxf.plugins.xxxx.XXXXFragmenter,
>> x-gp-has-filter=0, x-gp-profile=XXXX,
>> X-GP-RESOLVER=com.xxxx.pxf.plugins.xxxx.XXXXResolver, x-gp-segment-count=1,
>> x-gp-segment-id=0, x-gp-uri=pxf://localhost:51200/foo.main?PROFILE=XXXX,
>> x-gp-url-host=localhost, x-gp-url-port=51200, x-gp-xid=1365}"
>> So stream() is called from NativeMethodAccessorImpl.invoke0, that's
>> something I couldn't follow. Is it making sense that "path" showing
>> something strange? Should I get rid of protData.setDataSource(path) here?
>> What is this code used for? Where is the "path" coming from? Is it
>> constructed by X-GP-DATA-DIR and X-GP-XID and X-GP-SEGMENT-ID ?
>>
>> I'd expect to get "foo.main" instead of "/foo.main/1365_0" from
>> InputData.getDataSource() like what I got in ReadAccessor
>>
>>
>>
>> At 2015-11-06 11:49:08, "hawqstudy" <hawqstudy@163.com> wrote:
>>
>> Hi Guys,
>>
>> I've developed a PXF plugin and able to make it work to read from our
>> data source.
>> However I implemented WriteResolver and WriteAccessor, however when I
>> tried to insert into the table I got the following exception:
>>
>> postgres=# CREATE EXTERNAL TABLE t3 (id int, total int, comments varchar)
>>
>> LOCATION ('pxf://localhost:51200/foo.bar?PROFILE=XXXX')
>>
>> FORMAT 'custom' (formatter='pxfwritable_import') ;
>>
>> CREATE EXTERNAL TABLE
>>
>> postgres=# select * from t3;
>>
>>  id  | total | comments
>>
>> -----+-------+----------
>>
>>  100 |   500 |
>>
>>  100 |  5000 | abcdfe
>>
>>      |  5000 | 100
>>
>> (3 rows)
>>
>> postgres=# drop external table t3;
>>
>> DROP EXTERNAL TABLE
>>
>> postgres=# CREATE WRITABLE EXTERNAL TABLE t3 (id int, total int, comments
>> varchar)
>>
>> LOCATION ('pxf://localhost:51200/foo.bar?PROFILE=XXXX')
>>
>> FORMAT 'custom' (formatter='pxfwritable_export') ;
>>
>> CREATE EXTERNAL TABLE
>>
>> postgres=# insert into t3 values ( 1, 2, 'hello');
>>
>> ERROR:  remote component error (500) from '127.0.0.1:51200':  type
>> Exception report   message
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>> Access denied for user pxf. Superuser privilege is required    description
>>   The server encountered an internal error that prevented it from
>> fulfilling this request.    exception   javax.servlet.ServletException:
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>> Access denied for user pxf. Superuser privilege is required
>> (libchurl.c:852)  (seg6 localhost.localdomain:40000 pid=19701)
>> (dispatcher.c:1681)
>> Nov 07, 2015 11:40:08 AM com.sun.jersey.spi.container.ContainerResponse
>> mapMappableContainerException
>>
>> The log shows:
>>
>> SEVERE: The exception contained within MappableContainerException could
>> not be mapped to a response, re-throwing to the HTTP container
>>
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
>> Access denied for user pxf. Superuser privilege is required
>>
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:122)
>>
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:5906)
>>
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.datanodeReport(FSNamesystem.java:4941)
>>
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDatanodeReport(NameNodeRpcServer.java:1033)
>>
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDatanodeReport(ClientNamenodeProtocolServerSideTranslatorPB.java:698)
>>
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>>
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>>
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>>
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>>
>> at java.security.AccessController.doPrivileged(Native Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>>
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>>
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>>
>> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
>>
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>>
>> at com.sun.proxy.$Proxy63.getDatanodeReport(Unknown Source)
>>
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:626)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>>
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>
>> at com.sun.proxy.$Proxy64.getDatanodeReport(Unknown Source)
>>
>> at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:2562)
>>
>> at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getDataNodeStats(DistributedFileSystem.java:1196)
>>
>> at
>> com.pivotal.pxf.service.rest.ClusterNodesResource.read(ClusterNodesResource.java:62)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at
>> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>>
>> at
>> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>>
>> at
>> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>>
>> at
>> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>>
>> at
>> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>>
>> at
>> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>>
>> at
>> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>>
>> at
>> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>>
>> at
>> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
>>
>> at
>> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
>>
>> at
>> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
>>
>> at
>> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
>>
>> at
>> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
>>
>> at
>> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
>>
>> at
>> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
>>
>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
>>
>> at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
>>
>> at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>>
>> at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
>>
>> at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
>>
>> at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
>>
>> at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
>>
>> at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
>>
>> at
>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
>>
>> at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
>>
>> at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
>>
>> at
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:957)
>>
>> at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
>>
>> at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
>>
>> at
>> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
>>
>> at
>> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
>>
>> at
>> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at
>> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>
>> at java.lang.Thread.run(Thread.java:745)
>> Since our datasource is totally indepedent from HDFS, I'm not sure why
>> it's still trying to access HDFS and get superuser access.
>> Please let me know if there anything missing here.
>> Cheers
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>

Mime
View raw message