incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Potter <thelabd...@gmail.com>
Subject Re: Pig partition filter using operator other than ==
Date Tue, 20 Nov 2012 22:51:37 GMT
Unfortunately, looks like the same issue with Pig 0.11:

2012-11-20 22:48:20,408 [main] WARN
 org.apache.pig.newplan.PColFilterExtractor - No partition filter push
down: Internal error while processing any partition filter conditions in
the filter after the load


On Tue, Nov 20, 2012 at 10:38 AM, Travis Crawford
<traviscrawford@gmail.com>wrote:

> Can you try your query with pig 0.11:
>
>     http://svn.apache.org/repos/asf/pig/branches/branch-0.11/
>
> and let us know if it works?
>
> --travis
>
>
>
> On Tue, Nov 20, 2012 at 6:35 AM, Timothy Potter <thelabdude@gmail.com>
> wrote:
> > Thanks for your help Travis and Aniket. I ended up applying the patch
> from
> > HIVE-2084 and also making the FCOMMENT to COMMENT change to package.jdo
> > suggested by Travis. It seems to be working now, except I have a new
> > problem. If my Pig filter only contains a single clause that filters on
> the
> > partition field, the filter fails to get "pushed" into the load, i.e.
> >
> > signals_for_day = filter signals by day >= '2012-10-31_2000';
> >
> > This fails at the following place in the 0.4.0 code (note the line
> numbers
> > might be slightly off as I've added some debug statements here and there:
> >
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.logInternalErrorAndSetFlag(PColFilterExtractor.java:482)
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.getExpression(PColFilterExtractor.java:434)
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.getExpression(PColFilterExtractor.java:473)
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.getExpression(PColFilterExtractor.java:461)
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.getExpression(PColFilterExtractor.java:473)
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.getExpression(PColFilterExtractor.java:449)
> > at
> >
> org.apache.pig.newplan.PColFilterExtractor.visit(PColFilterExtractor.java:115)
> > at
> >
> org.apache.pig.newplan.logical.rules.PartitionFilterOptimizer$PartitionFilterPushDownTransformer.transform(PartitionFilterOptimizer.java:160)
> > at
> >
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:110)
> >
> >
> > However, if I add another clause to the filter, then it works fine, i.e.
> >
> > signals_for_day = filter signals by (day >= '2012-10-31_2000' AND
> service IS
> > NOT NULL);
> >
> > Note that service is not a partition field. This filter works fine and
> the
> > partition seems to get pushed into the load based on the number of input
> > paths reported by the MR job.
> >
> > I'm using Pig 0.10 on CDH4. Seems like a Pig bug to me ...
> >
> > Cheers,
> > Tim
> >
> >
> > On Mon, Nov 19, 2012 at 5:17 PM, Travis Crawford <
> traviscrawford@gmail.com>
> > wrote:
> >>
> >> Quick fix is forcing these jar versions in your build:
> >>
> >> datanucleus-core 2.2.5
> >> datanucleus-rdbms 2.2.4
> >>
> >> If you do a regular "ant package" then update these jars in the dist
> >> dir it works fine. Note if using the metastore thrift service you only
> >> need to do this on the server-side, as clients will not be using
> >> datanucleus at all.
> >>
> >> I agree this is a major issue; I just added fix version 0.5 to
> >> https://issues.apache.org/jira/browse/HCATALOG-209 so we don't forget
> >> about this in the next release.
> >>
> >> --travis
> >>
> >>
> >> On Mon, Nov 19, 2012 at 3:50 PM, Aniket Mokashi <aniket486@gmail.com>
> >> wrote:
> >> > There is an easy way to fix this. You need to re-compile the fix
> >> > suggested
> >> > in HIVE-2609 and jar it up in datanucleus-rdbms jar along with other
> >> > class
> >> > files.
> >> >
> >> > ~Aniket
> >> >
> >> >
> >> > On Mon, Nov 19, 2012 at 12:51 PM, Timothy Potter <
> thelabdude@gmail.com>
> >> > wrote:
> >> >>
> >> >> Ok, nevermind - looks like a known issue with Hive's data nucleus
> >> >> dependency: https://issues.apache.org/jira/browse/PIG-2339
> >> >>
> >> >> Will move to Postgres!
> >> >>
> >> >>
> >> >> On Mon, Nov 19, 2012 at 1:30 PM, Timothy Potter <
> thelabdude@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> More to this ... finally tracked down the hive server log and am
> >> >>> seeing
> >> >>> this:
> >> >>>
> >> >>> 2012-11-19 19:42:53,700 ERROR server.TThreadPoolServer
> >> >>> (TThreadPoolServer.java:run(182)) - Error occurred during processing
> >> >>> of
> >> >>> message.
> >> >>> java.lang.NullPointerException
> >> >>> at
> >> >>>
> >> >>>
> org.datanucleus.store.mapped.mapping.MappingHelper.getMappingIndices(MappingHelper.java:35)
> >> >>> at
> >> >>>
> >> >>>
> org.datanucleus.store.mapped.expression.StatementText.applyParametersToStatement(StatementText.java:194)
> >> >>> at
> >> >>>
> >> >>>
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getPreparedStatementForQuery(RDBMSQueryUtils.java:233)
> >> >>> at
> >> >>>
> >> >>>
> org.datanucleus.store.rdbms.query.legacy.SQLEvaluator.evaluate(SQLEvaluator.java:115)
> >> >>> at
> >> >>>
> >> >>>
> org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.performExecute(JDOQLQuery.java:288)
> >> >>> at org.datanucleus.store.query.Query.executeQuery(Query.java:1657)
> >> >>> at
> >> >>>
> >> >>>
> org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
> >> >>> at org.datanucleus.store.query.Query.executeWithMap(Query.java:1526)
> >> >>> at org.datanucleus.jdo.JDOQuery.executeWithMap(JDOQuery.java:334)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.ObjectStore.listMPartitionsByFilter(ObjectStore.java:1711)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:1581)
> >> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >>> at
> >> >>>
> >> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >>> at
> >> >>>
> >> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >>> at java.lang.reflect.Method.invoke(Method.java:597)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
> >> >>> at $Proxy4.getPartitionsByFilter(Unknown Source)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:2466)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:5817)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:5805)
> >> >>> at
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:115)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:112)
> >> >>> at java.security.AccessController.doPrivileged(Native Method)
> >> >>> at javax.security.auth.Subject.doAs(Subject.java:396)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:520)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:123)
> >> >>> at
> >> >>>
> >> >>>
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
> >> >>> at
> >> >>>
> >> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >>> at
> >> >>>
> >> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >>> at java.lang.Thread.run(Thread.java:662)
> >> >>>
> >> >>>
> >> >>> On Mon, Nov 19, 2012 at 12:53 PM, Timothy Potter
> >> >>> <thelabdude@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I'm using HCatalog 0.4.0 with Pig 0.10 and am not having success
> >> >>>> using
> >> >>>> an operator other than (==) with my partition field.
> >> >>>>
> >> >>>> For example, the following works (day is my partition field):
> >> >>>>
> >> >>>> signals = load 'signals' using
> org.apache.hcatalog.pig.HCatLoader();
> >> >>>>
> >> >>>> signals_for_day = filter signals by (day == '2012-10-30_1200'
AND
> >> >>>> service IS NOT NULL);
> >> >>>>
> >> >>>> samp1 = sample signals_for_day 0.01;
> >> >>>>
> >> >>>> dump samp1;
> >> >>>>
> >> >>>>
> >> >>>> but, if I change my filter to: signals_for_day = filter signals
by
> >> >>>> (day
> >> >>>> >= '2012-10-30_1200' AND service IS NOT NULL);
> >> >>>>
> >> >>>> Then I get the following error:
> >> >>>>
> >> >>>> Caused by: java.io.IOException:
> >> >>>> org.apache.thrift.transport.TTransportException
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:42)
> >> >>>> at
> org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:90)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:380)
> >> >>>> ... 19 more
> >> >>>> Caused by: org.apache.thrift.transport.TTransportException
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
> >> >>>> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> >> >>>> at
> >> >>>>
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partitions_by_filter(ThriftHiveMetastore.java:1511)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions_by_filter(ThriftHiveMetastore.java:1495)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitionsByFilter(HiveMetaStoreClient.java:691)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hcatalog.mapreduce.InitializeInput.getSerializedHcatKeyJobInfo(InitializeInput.java:98)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:73)
> >> >>>> at
> >> >>>>
> >> >>>>
> org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:40)
> >> >>>> ... 21 more
> >> >>>>
> >> >>>> I can start debugging but would like to know if HCatalog is
> supposed
> >> >>>> to
> >> >>>> support this type of filtering by partition fields?
> >> >>>>
> >> >>>> Thanks.
> >> >>>> Tim
> >> >>>>
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > "...:::Aniket:::... Quetzalco@tl"
> >
> >
>

Mime
View raw message