hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <navis....@nexr.com>
Subject Re: Hive splits/adds rows when outputting dataset with new lines
Date Tue, 07 Oct 2014 06:13:54 GMT
Try with set hive.default.fileformat=SequenceFile;

Thanks,
Navis

2014-10-06 20:51 GMT+09:00 Maciek <maciek@sonra.io>:

> Hello,
>
> I've encountered a situation when printing new lines corrupts (multiplies)
> the returned dataset.
> This seem to be similar to HIVE-3012
> <https://issues.apache.org/jira/browse/HIVE-3012> (fixed on 0.11), but as
> I'm on Hive 0.13 it's still the case.
> Here are the steps to illustrate/reproduce:
>
> 1. Fist let'e create table with one row and one column by selecting from
> any existing table (substitute ANYTABLE respecitvely):
>
> CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM
> ANYTABLE LIMIT 1;
>
> and verify:
>
> SELECT * FROM singlerow;
>
> OK-----------
> worldofhostels
>
> Time taken: 0.028 seconds, Fetched: 1 row(s)
>
> All good so far.
> 2. Now let's introduce newline here by:
>
> SELECT regexp_replace(wordsmerged,'of',"\nof\n") wordsseparate FROM
> singlerow;
>
> OK----------
>
> world
> of
> hostels
>
> Time taken: 6.404 seconds, Fetched: 3 row(s)
> and I'm suddenly getting 3 rows now.
> 3. This is not just for CLI output as when submitting CTAS, it
> materializes such corrupted result set:
>
> CREATE TABLE corrupted AS
> SELECT regexp_replace(wordsmerged,'of',"\nof\n") wordsseparate,
> wordsmerged FROM singlerow;
>
> hive> select * from corrupted;
>
> OK
>
> world NULL
> of NULL
> hostels worldofhostels
>
> Time taken: 0.029 seconds, Fetched: 3 row(s)
> Apparently, the same happens - new table is split into multiple rows with
> columns following the one in question (like wordsmerged) become NULLs
> Am i doing something wrong here?
>
> Regards,
> Maciek
>

Mime
View raw message