hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongzhi Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9201) Lazy functions do not handle newlines and carriage returns properly
Date Mon, 05 Jan 2015 19:33:37 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264965#comment-14264965
] 

Yongzhi Chen commented on HIVE-9201:
------------------------------------

[~ashutoshgupta17@gmail.com],
Are you trying to say we start to Implement "LINES TERMINATED BY" for hive? It is treated
as not fixable by 
https://issues.apache.org/jira/browse/HIVE-302
In current hive code, it seems we just error out the line terminator other than \n, and many
places just assume the \n is the only line terminator. 
        case HiveParser.TOK_TABLEROWFORMATLINES:
          String lineDelim = unescapeSQLString(rowChild.getChild(0).getText());
          tblDesc.getProperties().setProperty(serdeConstants.LINE_DELIM, lineDelim);
          if (!lineDelim.equals("\n") && !lineDelim.equals("10")) {
            throw new SemanticException(generateErrorMessage(rowChild,
                ErrorMsg.LINES_TERMINATED_BY_NON_NEWLINE.getMsg()));
          }
          break;
But with MAPREDUCE-2602  fixed, it is possible for hive to support changing the line terminator.
Just wonder it may not be a easy change.

Thanks.

> Lazy functions do not handle newlines and carriage returns properly
> -------------------------------------------------------------------
>
>                 Key: HIVE-9201
>                 URL: https://issues.apache.org/jira/browse/HIVE-9201
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>         Attachments: HIVE-9201.1.patch
>
>
> Hive returns wrong result when returning string has char \r or \n in it.  This happens
when the query can trigger mapreduce jobs. 
> For example, for a table named strsim with only one row:
> As shown following, query 1 returns 1 row while query 2 returns 3 rows.
> Query 1:
> select "abc", narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
> Query 2:
> select "a\rb\nc", narray from strsim LATERAL VIEW explode(array(1)) C AS narray;
> select "abc", narray from strsim LATERAL VIEW e 
> xplode(array(1)) C AS narray;
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2014-12-23 15:00:08,958 Stage-1 map = 0%,  reduce = 0%
> INFO  : Ended Job = job_local1178499218_0015
> +------+---------+--+
> 1 row selected (1.283 seconds)
> | _c0  | narray  |
> +------+---------+--+
> | abc  | 1       |
> +------+---------+--+
> select "a\rb\nc", narray from strsim LATERAL VI 
> EW explode(array(1)) C AS narray;
> INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2014-12-23 15:04:35,441 Stage-1 map = 0%,  reduce = 0%
> INFO  : Ended Job = job_local1816711099_0016
> +------+---------+--+
> 3 rows selected (1.135 seconds)
> | _c0  | narray  |
> +------+---------+--+
> | a    | NULL    |
> | b    | NULL    |
> | c    | 1       |
> +------+---------+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message