hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject Re: Loading data containing newlines
Date Tue, 12 Jan 2016 17:51:44 GMT
Try CSV serde. It should correctly parse quoted field value having newline
inside
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde

Hadoop should automatically read bz2 files


On Tue, Jan 12, 2016 at 9:40 AM, Gerber, Bryan W <Bryan.Gerber@pnnl.gov>
wrote:

> We are attempting to load CSV text files (compressed to bz2) containing
> newlines in fields using EXTERNAL tables and INSERT/SELECT into ORC format
> tables.  Data volume is ~1TB/day, we are really trying to avoid unpacking
> them to condition the data.
>
>
>
> A few days of research has us ready to implement custom  input/output
> formats to handle the ingest.  Any other suggestions that may be less
> effort with low impact to load times?
>
>
>
> Thanks,
>
> Bryan G.
>

Mime
View raw message