hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciek <mac...@sonra.io>
Subject Re: Hive splits/adds rows when outputting dataset with new lines
Date Tue, 07 Oct 2014 20:58:31 GMT
This …works!
quite surprised as per the steps I outlined, the issue manifested even
without CTAS (regular SELECT)
still don't see how could that be related …or those are two separate issues?

Also, maybe you know - is there any way to make it work for TextFile?
Thank you,
Maciek

On Tue, Oct 7, 2014 at 7:13 AM, Navis류승우 <navis.ryu@nexr.com> wrote:

> Try with set hive.default.fileformat=SequenceFile;
>
> Thanks,
> Navis
>
> 2014-10-06 20:51 GMT+09:00 Maciek <maciek@sonra.io>:
>
>> Hello,
>>
>> I've encountered a situation when printing new lines corrupts
>> (multiplies) the returned dataset.
>> This seem to be similar to HIVE-3012
>> <https://issues.apache.org/jira/browse/HIVE-3012> (fixed on 0.11), but
>> as I'm on Hive 0.13 it's still the case.
>> Here are the steps to illustrate/reproduce:
>>
>> 1. Fist let'e create table with one row and one column by selecting from
>> any existing table (substitute ANYTABLE respecitvely):
>>
>> CREATE TABLE singlerow AS SELECT 'worldofhostels' wordsmerged FROM
>> ANYTABLE LIMIT 1;
>>
>> and verify:
>>
>> SELECT * FROM singlerow;
>>
>> OK-----------
>> worldofhostels
>>
>> Time taken: 0.028 seconds, Fetched: 1 row(s)
>>
>> All good so far.
>> 2. Now let's introduce newline here by:
>>
>> SELECT regexp_replace(wordsmerged,'of',"\nof\n") wordsseparate FROM
>> singlerow;
>>
>> OK----------
>>
>> world
>> of
>> hostels
>>
>> Time taken: 6.404 seconds, Fetched: 3 row(s)
>> and I'm suddenly getting 3 rows now.
>> 3. This is not just for CLI output as when submitting CTAS, it
>> materializes such corrupted result set:
>>
>> CREATE TABLE corrupted AS
>> SELECT regexp_replace(wordsmerged,'of',"\nof\n") wordsseparate,
>> wordsmerged FROM singlerow;
>>
>> hive> select * from corrupted;
>>
>> OK
>>
>> world NULL
>> of NULL
>> hostels worldofhostels
>>
>> Time taken: 0.029 seconds, Fetched: 3 row(s)
>> Apparently, the same happens - new table is split into multiple rows with
>> columns following the one in question (like wordsmerged) become NULLs
>> Am i doing something wrong here?
>>
>> Regards,
>> Maciek
>>
>
>


-- 
Kind Regards
Maciek Kocon

Mime
View raw message