hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2303) files with control-A,B are not delimited correctly.
Date Fri, 29 Jul 2011 07:28:09 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072719#comment-13072719
] 

Amareshwari Sriramadasu commented on HIVE-2303:
-----------------------------------------------

This problem occurs because FileSinkOperator generates a TableDesc with default properties
for storing the output. Solution is to escape the delimiters for the output table. 

Shouldn't escaping of delimiters happen always in LazySimpleSerde? 

> files with control-A,B are not delimited correctly.
> ---------------------------------------------------
>
>                 Key: HIVE-2303
>                 URL: https://issues.apache.org/jira/browse/HIVE-2303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> The following is from one of our users:
>  
> create external table impressions (imp string, msg string)
>   row format delimited
>     fields terminated by '\t'
>     lines terminated by '\n'
>   stored as textfile                 
>   location '/xxx';
>  
> Some strings in my data contains Control-A, Control-B etc as internal delimiters.  If
I do a
>  
> Select * from impressions limit 10;
>  
> All fields were able to print correctly.  However if I do a
>  
> Select * from impressions where msg regexp '.*' limit 10;
>  
> The fields were broken by the control characters.  The difference between the 2 commands
is that the latter requires a map-reduce job.  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message