hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result
Date Fri, 11 Dec 2015 03:27:10 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052092#comment-15052092
] 

Chaoyu Tang commented on HIVE-12541:
------------------------------------

So basically the regex rules used for symbolic path are same as those documented in FileSystem.globStatus,
could you add more test cases with symbolic paths having different regex, and even at different
path levels? your case is ../data/files/T* which means the files starting with 0 or more Ts,
right? 

> Using CombineHiveInputFormat with the origin inputformat  SymbolicTextInputFormat  ,it
will get a wrong result
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-12541
>                 URL: https://issues.apache.org/jira/browse/HIVE-12541
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 1.2.0, 1.2.1
>            Reporter: Xiaowei Wang
>            Assignee: Xiaowei Wang
>             Fix For: 1.2.1
>
>         Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir '/user/hive/warehouse/symlink_text_input_format' ,  
the content of the link file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message