hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hengyu Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18441) NullPointerException due to Hadoop23Shims doesn't compatible with Hadoop 2.2
Date Fri, 12 Jan 2018 08:16:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323684#comment-16323684
] 

Hengyu Dai commented on HIVE-18441:
-----------------------------------

[~lirui], you are right, from the following picture we can see the job config property "mapreduce.input.fileinputformat.inputdir"
is set with "nullscan" path rightly, while after listStatus() method in org.apache.hadoop.mapreduce.lib.input.FileInputFormat.java,
we get a path without any schema returned, it's more likely a Hadoop issue, it's fixed in
Hadoop 2.9 (the version I tested, maybe earlier). Hive should be compatible with this problem.

!debug.jpg|thumbnail!

> NullPointerException due to Hadoop23Shims doesn't compatible with Hadoop 2.2
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-18441
>                 URL: https://issues.apache.org/jira/browse/HIVE-18441
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.1.1, 2.2.0, 2.3.0
>            Reporter: Hengyu Dai
>         Attachments: HIVE-18441.01.patch, HIVE-18441.02.patch, HIVE-18441.patch, debug.jpg,
hadoop2.2.jpg, hadoop2.9.jpg
>
>
> Hive 2.x is not compatible with hadoop 2.2 (maybe there is same problem in other hadoop
version too) when "nullscan" path is existed.
> here is the listStatus() method in Hadoop23Shims.java
> {code:java}
> protected List<FileStatus> listStatus(JobContext job) throws IOException {
>         List<FileStatus> result = super.listStatus(job);
>         Iterator<FileStatus> it = result.iterator();
>         while (it.hasNext()) {
>           FileStatus stat = it.next();
>           if (!stat.isFile() || (stat.getLen() == 0 && !stat.getPath().toUri().getScheme().equals("nullscan")))
{
>             it.remove();
>           }
>         }
>         return result;
>       }
> {code}
> the first line "super.listStatus(job)" get different FileStatus object from Hadoop 2.2
and Hadoop 2.9
> I have tested Hive2.1 with Hadoop2.2, Hive2.1 with Hadoop2.9, and NPE occurs in Hive2.1
with Hadoop2.2
> My test SQL is 
> {code:java}
> select * from (select key from src where false) a left outer join (select key from srcpart
limit 0) b on a.key=b.key;
> {code}
> it's from optimize_nullscan.q, table src and srcpart in the SQL is created by q_test_init.sql.
> the problem is, in hadoop 2.2, super.listStatus(job) returns a FileStatus object whose
"Path" field doesn't contain a schema for "nullscan" path, so, "stat.getPath().toUri().getScheme()"
in the if statement get NULL, and call null.equals("nullscan") will lead NPE.
> In contrast, super.listStatus(job) will get a valid Path whose schema is "nullscan".
> the debug pictures from Hadoop 2.2 and Hadoop 2.9 is attached, we can see the result
list returned by super.listStatus(job) is different, Hadoop 2.2 gets "/default.srcpart/part..."
and Hadoop 2.9 get "nullscan://null/default.srcpart/part..."
> (this bug is not happened with normal path like "hdfs://..." )
> we should take consideration of stat.getPath().toUri().getScheme() returns null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message