impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Armstrong (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-658) Impala Incorrectly Handles Newlines
Date Tue, 13 Jun 2017 15:24:00 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Armstrong resolved IMPALA-658.
----------------------------------
    Resolution: Not A Problem

By default Hive text tables have no escape character, so there is no way to escape newlines,
leading to this (questionable)  behaviour.  If you define an escape character this works better

{code}
[localhost:21000] > create table try (text string) row format delimited escaped by '\2';
Query: create table try (text string) row format delimited escaped by '\2'

Fetched 0 row(s) in 0.04s
[localhost:21000] > insert into try values ('foo
bar
baz');
Query: insert into try values ('foo
bar
baz')
Query submitted at: 2017-06-13 08:21:23 (Coordinator: http://tarmstrong-box:25000)
Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=3241517e3d9674ee:88a04ccc00000000
Modified 1 row(s) in 0.11s
[localhost:21000] > select * from try;
Query: select * from try
Query submitted at: 2017-06-13 08:21:25 (Coordinator: http://tarmstrong-box:25000)
Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=1f4ffc5608f2ab62:c0ef821d00000000
+------+
| text |
+------+
| foo  |
| bar  |
| baz  |
+------+
Fetched 3 row(s) in 0.12s
{code}

> Impala Incorrectly Handles Newlines
> -----------------------------------
>
>                 Key: IMPALA-658
>                 URL: https://issues.apache.org/jira/browse/IMPALA-658
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 1.1.1
>            Reporter: David E. Wheeler
>            Priority: Minor
>              Labels: correctness, downgraded, text
>
> Impala incorrectly stores a string value with newlines as multiple rows, rather than
a single row with newlines:
> {code}
> [example.com:21000] > create table try (text string);
> Query: create table try (text string)
> [example.com:21000] > insert into try values ('foo
>                     > bar
>                     > baz');
> Query: insert into try values ('foo
> bar
> baz')
> Inserted 1 rows in 2.46s
> [example.com:21000] > select * from try;
> Query: select * from try
> Query finished, fetching results ...
> +------+
> | text |
> +------+
> | foo  |
> | bar  |
> | baz  |
> +------+
> Returned 3 row(s) in 0.42s
> {code}
> As you can see, it thinks it inserted one row, but when you select from the table, it
returns three rows. I had a look at the text file generated for this table, and it looks like
this:
> {code}
> foo
> bar
> baz
> {code}
> So I think newlines are not properly escaped for storage in the text file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message