hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-22936) NPE in SymbolicInputFormat
Date Wed, 26 Feb 2020 23:53:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-22936?focusedWorklogId=393853&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393853
]

ASF GitHub Bot logged work on HIVE-22936:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Feb/20 23:52
            Start Date: 26/Feb/20 23:52
    Worklog Time Spent: 10m 
      Work Description: mingzhu-abb commented on pull request #925: [HIVE-22936] NPE in SymbolicInputFormat
URL: https://github.com/apache/hive/pull/925
 
 
   Fix a bug which causes NullPointException in SymbolicInputFormat, when the symlink file
contains URI with schema different from default file system.
   
   Jira: https://issues.apache.org/jira/browse/HIVE-22936
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 393853)
    Remaining Estimate: 0h
            Time Spent: 10m

> NPE in SymbolicInputFormat
> --------------------------
>
>                 Key: HIVE-22936
>                 URL: https://issues.apache.org/jira/browse/HIVE-22936
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 3.1.2
>            Reporter: Redis Liu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: npe-symbolic-inputformat.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h2. Symptom
> I was running Hive over AWS S3 Inventory Report, which uses SymlinkTextInputFormat, and
symlink file content is the FQDN S3 URL of each s3 file, like :
> {code:java}
> s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE
> s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD{code}
> When I have the following setting:
> {code:java}
> set hive.rework.mapredwork=true;  
> {code}
> The job fails with *NullPointException*, without stack trace.
> h2. Cause
> The content of symlink may be arbitrary full qualified FS path, while SymbolicInputFormat
uses the default FS instance to get the status of the data files, which fails (and returns
null) when the schema of data file differs from Hive's default FS.
> Code point:
> [https://github.com/apache/hive/blob/cfc12f05f0c034f9aad149960e58d40902e0dcfe/ql/src/java/org/apache/hadoop/hive/ql/io/SymbolicInputFormat.java#L78]
> {code:java}
>               // "fileSystem" may not be able to list status for given file uri.
>               FileStatus[] matches = fileSystem.globStatus(new Path(line));{code}
> h2. Fix
> Please check attached npe-symbolic-inputformat.patch
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message