hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathanial Thelen <n...@natethelen.com>
Subject Re: Problems with 0.11, count(DISTINCT), and NPE
Date Tue, 03 Sep 2013 21:18:06 GMT
Is there a way to run a patch on EMR?

Thanks,
Nate

On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan <hashutosh@apache.org> wrote:

> Fix in very related area has been checked in trunk today : https://issues.apache.org/jira/browse/HIVE-5129
Likely that will fix your issue. 
> Can you try latest trunk?
> 
> Ashutosh
> 
> 
> On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen <nate@natethelen.com> wrote:
> I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I have been getting
NullPointerExceptions (NPE) for certain queries in our staging environment.  Only difference
between stage and production is the amount of traffic we get so the data set is much smaller.
 We are not using any custom code.
> 
> I have greatly simplified the query down to the bare minimum that will cause the error:
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid
> 
> This will return the following before any Map Reduce jobs start:
> 
> FAILED: NullPointerException null
> 
> Looking in the hive log at /mnt/var/log/apps/hive_0110.log and scanning, I see this error:
> 
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889))
- Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=29
length: 94324 file count: 20 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889))
- Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=30
length: 142609 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889))
- Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adgroup/year=2013/month=08/day=30
length: 65519 file count: 21 directory count: 1
> 2013-09-03 18:09:19,796 INFO  org.apache.hadoop.hive.ql.exec.Utilities (Utilities.java:getInputSummary(1889))
- Cache Content Summary for s3://{ourS3Bucket}/hive/data/stage/adview/year=2013/month=08/day=29
length: 205096 file count: 20 directory count: 1
> 2013-09-03 18:09:19,800 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
(MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is
applicable
> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
(MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table scans
> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
(MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans where optimization is
applicable
> 2013-09-03 18:09:19,801 INFO  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer
(MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table scans
> 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver (SessionState.java:printError(386))
- FAILED: NullPointerException null
> java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$MetadataOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:87)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:124)
> 	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:101)
> 	at org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve(MetadataOnlyOptimizer.java:175)
> 	at org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:79)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAnalyzer.java:8426)
> 	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8789)
> 	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:278)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
> 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
> 	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
> 	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
> 	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
> 	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
> 	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
> 
> The same error also happens if I do an INNER JOIN to adclick, FYI.
> 
> I have checked that there are not any null values for any of the columns referenced in
the query. 
> 
> Making almost any changes to the query results it in successfully running.  Here are
some I have tried:
> 
> Removed JOIN to adgroup:
> 
> SELECT
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adview av
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 561	6
> 
> Removed JOIN to adclick:
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid;
> 
> WORKS:
> 543	561
> 
> Removing DISTINCT from any of the 3 counts
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 543	561	7
> 
> 
> SELECT
>     count(ag.adGroupGuid) as groups,
>     count(DISTINCT av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 562	561	6
> 
> 
> SELECT
>     count(DISTINCT ag.adGroupGuid) as groups,
>     count(av.adViewGuid) as ads,
>     count(DISTINCT ac.adViewGuid) as uniqueClicks
> FROM
>     adgroup ag
>     INNER JOIN adview av ON av.adGroupGuid = ag.adGroupGuid
>     LEFT OUTER JOIN adclick ac ON ac.adViewGuid = av.adViewGuid;
> 
> WORKS:
> 543	562	6 
> 
> I am not exactly sure what to do next.  Thoughts?
> 
> Nate
> 


Mime
View raw message