hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: "Create external table" nulling data from source table
Date Thu, 28 Jan 2016 21:22:54 GMT
> And again: the same row is correct if I export a small set of data, and
>incorrect if I export a large set - so I think that file/data size has
>something to do with this.

My Phoenix vs LLAP benchmark hit size related issues in ETL.

In my case, the tipping point was >1 hdfs block per CSV file.

Generating CSV files compressed with SNAPPY was how I prevented the
old-style MapReduce splitters from arbitrarily chopping up those files on
block boundaries while loading.

>I just tested and if I take the orc table, copy it to a sequence file,
>and then copy to a csv "file", everything looks good.
> So, my (not-very-educated) guess is that this has to do with ORC files.

Yes, though somewhat indirectly. Check the output file sizes between those

ORC -> SequenceFile -> Text

will produce smaller text files (more of them) than

ORC -> Text.


View raw message