hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: ColumnarSerDe and LazyBinaryColumnarSerDe
Date Tue, 06 Mar 2012 19:42:24 GMT
I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient.
You tests aligns with our internal tests long time ago.

On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai <huaiyin.thu@gmail.com> wrote:
> Hi,
>
> Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
> general?
>
> Let me make my question more specific.
>
> I generated two tables from the table lineitem of TPC-H
> using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
> CREATE TABLE lineitem_rcfile_lazybinary
> ROW FORMAT SERDE
> "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe"
> STORED AS RCFile AS
> SELECT * from lineitem;
>
> CREATE TABLE lineitem_rcfile_lazy
> ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
> STORED AS RCFile AS
> SELECT * from lineitem;
>
> Since serialization of LazyBinaryColumnarSerDe is binary-based and that
> of ColumnarSerDe is text-based, I expect to see
> table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy.
> However, no matter whether compression is
> enabled, lineitem_rcfile_lazybinary is little bit larger
> than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong way?
>
> btw, the row group size of RCFile is 32MB.
>
> Thanks,
>
> Yin

Mime
View raw message