Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BFD0D101C5 for ; Tue, 3 Sep 2013 21:18:55 +0000 (UTC) Received: (qmail 3979 invoked by uid 500); 3 Sep 2013 21:18:54 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 3913 invoked by uid 500); 3 Sep 2013 21:18:54 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 3902 invoked by uid 99); 3 Sep 2013 21:18:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Sep 2013 21:18:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [209.85.220.47] (HELO mail-pa0-f47.google.com) (209.85.220.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Sep 2013 21:18:49 +0000 Received: by mail-pa0-f47.google.com with SMTP id kl13so6894564pab.20 for ; Tue, 03 Sep 2013 14:18:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=natethelen.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:message-id :references:to; bh=4Guh9t16JZgH/mBLJMTUdug8eu7+Cakf1mgccaax2sA=; b=eX/8Kk+iTwnqAocLhxQgF7R/MYhbTPCVb60MzhFTxI8/dncbQXq/K/3co8tls7VGkR 3IsrF8X+fhupXQ6Qk7qxcqbqP6qJO/rLzYeME4ugiF3E42GCRE0B+LqSXnYpd3V+0Xtx vv6OHP1R0pnhARI7Gv4eHGqI0BFORt0Y4pTcs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:message-id:references:to; bh=4Guh9t16JZgH/mBLJMTUdug8eu7+Cakf1mgccaax2sA=; b=kdkf/WFDLw0S7R4ee497oDcaQANhWYcs62zVdbOdDaI7F6oM/nZ86TkpZO1n7czvho lMh8UAf6XzKLz27fPVPNOZgN38GoA3MgbPJ+MCkFPIlG5SsctP/t8VdiB+xBIy7rVc14 toAizRcj+Y9CRadxfo66lbL628V5wKyXTqaL7TPnxMlKxvi5am+qUNSy0Jfyyn0D3VkO VGNr2GkRug7jHfqYgRq4rKyaK/lJXEiEyeZDUr59ULiSgKqTEeXCkJ3XZHCTWCGQVsvC IZO6pp2q6vAYPp28kKMqAU1EjAUS8zD0W9UVNvgqcYXa3r8ICZKcSGhYAxLnFxzxf5mL nVDg== X-Gm-Message-State: ALoCoQnY+V0KGx+Ly9SKMTOsPRy4n12ON8GHPlh+Io4m9AR4VrYeoKAv+AkbJMB/a3/WUw3lnxny X-Received: by 10.68.252.135 with SMTP id zs7mr5100785pbc.194.1378243088409; Tue, 03 Sep 2013 14:18:08 -0700 (PDT) Received: from [10.0.135.116] (75-145-75-117-Oregon.hfc.comcastbusiness.net. [75.145.75.117]) by mx.google.com with ESMTPSA id bg3sm24501053pbb.44.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 03 Sep 2013 14:18:07 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_AA514B49-E742-4A6A-A752-80721FE88547" Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Problems with 0.11, count(DISTINCT), and NPE From: Nathanial Thelen In-Reply-To: Date: Tue, 3 Sep 2013 14:18:06 -0700 Message-Id: References: To: user@hive.apache.org X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_AA514B49-E742-4A6A-A752-80721FE88547 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Is there a way to run a patch on EMR? Thanks, Nate On Sep 3, 2013, at 2:14 PM, Ashutosh Chauhan = wrote: > Fix in very related area has been checked in trunk today : = https://issues.apache.org/jira/browse/HIVE-5129 Likely that will fix = your issue.=20 > Can you try latest trunk? >=20 > Ashutosh >=20 >=20 > On Tue, Sep 3, 2013 at 2:03 PM, Nathanial Thelen = wrote: > I am running Hive in EMR and since upgrading to 0.11 from 0.8.1.8 I = have been getting NullPointerExceptions (NPE) for certain queries in our = staging environment. Only difference between stage and production is = the amount of traffic we get so the data set is much smaller. We are = not using any custom code. >=20 > I have greatly simplified the query down to the bare minimum that will = cause the error: >=20 > SELECT > count(DISTINCT ag.adGroupGuid) as groups, > count(DISTINCT av.adViewGuid) as ads, > count(DISTINCT ac.adViewGuid) as uniqueClicks > FROM > adgroup ag > INNER JOIN adview av ON av.adGroupGuid =3D ag.adGroupGuid > LEFT OUTER JOIN adclick ac ON ac.adViewGuid =3D av.adViewGuid >=20 > This will return the following before any Map Reduce jobs start: >=20 > FAILED: NullPointerException null >=20 > Looking in the hive log at /mnt/var/log/apps/hive_0110.log and = scanning, I see this error: >=20 > 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adgroup/year=3D2013/month=3D08/day=3D29= length: 94324 file count: 20 directory count: 1 > 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adview/year=3D2013/month=3D08/day=3D30 = length: 142609 file count: 21 directory count: 1 > 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adgroup/year=3D2013/month=3D08/day=3D30= length: 65519 file count: 21 directory count: 1 > 2013-09-03 18:09:19,796 INFO org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adview/year=3D2013/month=3D08/day=3D29 = length: 205096 file count: 20 directory count: 1 > 2013-09-03 18:09:19,800 INFO = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans = where optimization is applicable > 2013-09-03 18:09:19,801 INFO = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table = scans > 2013-09-03 18:09:19,801 INFO = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans = where optimization is applicable > 2013-09-03 18:09:19,801 INFO = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table = scans > 2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver = (SessionState.java:printError(386)) - FAILED: NullPointerException null > java.lang.NullPointerException > at = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$Metadat= aOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308) > at = org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalk= er.java:87) > at = org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.j= ava:124) > at = org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraph= Walker.java:101) > at = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve= (MetadataOnlyOptimizer.java:175) > at = org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(Ph= ysicalOptimizer.java:79) > at = org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAn= alyzer.java:8426) > at = org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticA= nalyzer.java:8789) > at = org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticA= nalyzer.java:278) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) > at = org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310) > at = org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231) > at = org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57) > at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:187) >=20 > The same error also happens if I do an INNER JOIN to adclick, FYI. >=20 > I have checked that there are not any null values for any of the = columns referenced in the query.=20 >=20 > Making almost any changes to the query results it in successfully = running. Here are some I have tried: >=20 > Removed JOIN to adgroup: >=20 > SELECT > count(DISTINCT av.adViewGuid) as ads, > count(DISTINCT ac.adViewGuid) as uniqueClicks > FROM > adview av > LEFT OUTER JOIN adclick ac ON ac.adViewGuid =3D av.adViewGuid; >=20 > WORKS: > 561 6 >=20 > Removed JOIN to adclick: >=20 > SELECT > count(DISTINCT ag.adGroupGuid) as groups, > count(DISTINCT av.adViewGuid) as ads > FROM > adgroup ag > INNER JOIN adview av ON av.adGroupGuid =3D ag.adGroupGuid; >=20 > WORKS: > 543 561 >=20 > Removing DISTINCT from any of the 3 counts >=20 > SELECT > count(DISTINCT ag.adGroupGuid) as groups, > count(DISTINCT av.adViewGuid) as ads, > count(ac.adViewGuid) as uniqueClicks > FROM > adgroup ag > INNER JOIN adview av ON av.adGroupGuid =3D ag.adGroupGuid > LEFT OUTER JOIN adclick ac ON ac.adViewGuid =3D av.adViewGuid; >=20 > WORKS: > 543 561 7 >=20 >=20 > SELECT > count(ag.adGroupGuid) as groups, > count(DISTINCT av.adViewGuid) as ads, > count(DISTINCT ac.adViewGuid) as uniqueClicks > FROM > adgroup ag > INNER JOIN adview av ON av.adGroupGuid =3D ag.adGroupGuid > LEFT OUTER JOIN adclick ac ON ac.adViewGuid =3D av.adViewGuid; >=20 > WORKS: > 562 561 6 >=20 >=20 > SELECT > count(DISTINCT ag.adGroupGuid) as groups, > count(av.adViewGuid) as ads, > count(DISTINCT ac.adViewGuid) as uniqueClicks > FROM > adgroup ag > INNER JOIN adview av ON av.adGroupGuid =3D ag.adGroupGuid > LEFT OUTER JOIN adclick ac ON ac.adViewGuid =3D av.adViewGuid; >=20 > WORKS: > 543 562 6=20 >=20 > I am not exactly sure what to do next. Thoughts? >=20 > Nate >=20 --Apple-Mail=_AA514B49-E742-4A6A-A752-80721FE88547 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 Is = there a way to run a patch on EMR?

Thanks,
Nate
hashutosh@apache.org> = wrote:

Fix in very related area has been checked = in trunk today : https://issues.ap= ache.org/jira/browse/HIVE-5129 Likely that will fix your = issue. 
Can you try latest = trunk?

Ashutosh


On Tue, Sep 3, = 2013 at 2:03 PM, Nathanial Thelen <nate@natethelen.com> wrote:
I am running Hive in EMR and since = upgrading to 0.11 from 0.8.1.8 I have been getting = NullPointerExceptions (NPE) for certain queries in our staging = environment.  Only difference between stage and production is the = amount of traffic we get so the data set is much smaller.  We are = not using any custom code.

I have greatly simplified the query down to the bare = minimum that will cause the = error:

SELECT
    = count(DISTINCT ag.adGroupGuid) as groups,
    = count(DISTINCT av.adViewGuid) as ads,
    count(DISTINCT ac.adViewGuid) as = uniqueClicks
FROM
    adgroup = ag
    INNER JOIN adview av ON av.adGroupGuid =3D = ag.adGroupGuid
    LEFT OUTER JOIN adclick ac ON = ac.adViewGuid =3D av.adViewGuid

This will return the following before any Map = Reduce jobs start:

FAILED: NullPointerException = null

Looking in the hive log = at /mnt/var/log/apps/hive_0110.log and scanning, I see this = error:

2013-09-03 18:09:19,796 INFO =  org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adgroup/year=3D2013/month=3D08/day=3D29= length: 94324 file count: 20 directory count: 1
2013-09-03 18:09:19,796 INFO =  org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adview/year=3D2013/month=3D08/day=3D30 = length: 142609 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO =  org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adgroup/year=3D2013/month=3D08/day=3D30= length: 65519 file count: 21 directory count: 1
2013-09-03 18:09:19,796 INFO =  org.apache.hadoop.hive.ql.exec.Utilities = (Utilities.java:getInputSummary(1889)) - Cache Content Summary for = s3://{ourS3Bucket}/hive/data/stage/adview/year=3D2013/month=3D08/day=3D29 = length: 205096 file count: 20 directory count: 1
2013-09-03 18:09:19,800 INFO =  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans = where optimization is applicable
2013-09-03 18:09:19,801 INFO =  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(301)) - Found 0 metadata only table = scans
2013-09-03 18:09:19,801 INFO =  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(267)) - Looking for table scans = where optimization is applicable
2013-09-03 18:09:19,801 INFO =  org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer = (MetadataOnlyOptimizer.java:dispatch(301)) - Found 1 metadata only table = scans
2013-09-03 18:09:19,801 ERROR org.apache.hadoop.hive.ql.Driver = (SessionState.java:printError(386)) - FAILED: NullPointerException = null
java.lang.NullPointerException
at = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer$Metadat= aOnlyTaskDispatcher.dispatch(MetadataOnlyOptimizer.java:308)
at = org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalk= er.java:87)
= at = org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.j= ava:124)
at = org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraph= Walker.java:101)
= at = org.apache.hadoop.hive.ql.optimizer.physical.MetadataOnlyOptimizer.resolve= (MetadataOnlyOptimizer.java:175)
at = org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(Ph= ysicalOptimizer.java:79)
= at = org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genMapRedTasks(SemanticAn= alyzer.java:8426)
at = org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticA= nalyzer.java:8789)
= at = org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticA= nalyzer.java:278)
at = org.apache.hadoop.hive.ql.Driver.compile(Driver.java:433)
at = org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
at = org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at = org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:310)
at = org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:231)
<= div> at = org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:466)
=
at = org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:819)
at = org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:674)
at = sun.reflect.NativeMethodAccessorImpl.invoke0(Native = Method)
at = sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:= 57)
at = sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm= pl.java:43)
= at java.lang.reflect.Method.invoke(Method.java:606)
at = org.apache.hadoop.util.RunJar.main(RunJar.java:187)

The same error also happens if I do an INNER JOIN to adclick, = FYI.

I have checked that there are not any null values for any of the columns = referenced in the query. 

Making almost any changes to the query results it in = successfully running.  Here are some I have = tried:

Removed JOIN to = adgroup:

SELECT
    = count(DISTINCT av.adViewGuid) as ads,
    count(DISTINCT ac.adViewGuid) as = uniqueClicks
FROM
    adview = av
    LEFT OUTER JOIN adclick ac ON ac.adViewGuid =3D= av.adViewGuid;

WORKS:
561 6

Removed JOIN to = adclick:

SELECT
    = count(DISTINCT ag.adGroupGuid) as groups,
    = count(DISTINCT av.adViewGuid) as ads
FROM
  =   adgroup ag
    INNER JOIN adview av ON av.adGroupGuid =3D = ag.adGroupGuid;

WORKS:
543 = 561

Removing DISTINCT from any of the 3 = counts

SELECT
    count(DISTINCT = ag.adGroupGuid) as groups,
    count(DISTINCT = av.adViewGuid) as ads,
    count(ac.adViewGuid) as = uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview av ON av.adGroupGuid =3D = ag.adGroupGuid
    LEFT OUTER JOIN adclick ac ON = ac.adViewGuid =3D = av.adViewGuid;

WORKS:
543 561 7


SELECT
    = count(ag.adGroupGuid) as groups,
    count(DISTINCT = av.adViewGuid) as ads,
    count(DISTINCT = ac.adViewGuid) as uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview = av ON av.adGroupGuid =3D ag.adGroupGuid
    LEFT = OUTER JOIN adclick ac ON ac.adViewGuid =3D = av.adViewGuid;

WORKS:
562 561 6


SELECT
    = count(DISTINCT ag.adGroupGuid) as groups,
    = count(av.adViewGuid) as ads,
    count(DISTINCT = ac.adViewGuid) as uniqueClicks
FROM
    adgroup ag
    INNER JOIN adview = av ON av.adGroupGuid =3D ag.adGroupGuid
    LEFT = OUTER JOIN adclick ac ON ac.adViewGuid =3D = av.adViewGuid;

WORKS:
543 562

I am not exactly sure what to do next.  Thoughts?

Nate


= --Apple-Mail=_AA514B49-E742-4A6A-A752-80721FE88547--