hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yin Huai <huaiyin....@gmail.com>
Subject Re: ColumnarSerDe and LazyBinaryColumnarSerDe
Date Wed, 07 Mar 2012 18:35:57 GMT
Thanks.

I forgot to consider the DOUBLE data type in the table. For the case of
lineitem, ColumnarSerDe can use less bytes to store a double
than LazyBinaryColumnarSerDe (8bytes).

Yin

On Tue, Mar 6, 2012 at 2:42 PM, yongqiang he <heyongqiangict@gmail.com>wrote:

> I guess LazyBinaryColumnarSerDe is not saving spaces, but is cpu efficient.
> You tests aligns with our internal tests long time ago.
>
> On Tue, Mar 6, 2012 at 8:58 AM, Yin Huai <huaiyin.thu@gmail.com> wrote:
> > Hi,
> >
> > Is LazyBinaryColumnarSerDe more space efficient than ColumnarSerDe in
> > general?
> >
> > Let me make my question more specific.
> >
> > I generated two tables from the table lineitem of TPC-H
> > using ColumnarSerDe and LazyBinaryColumnarSerDe as follows...
> > CREATE TABLE lineitem_rcfile_lazybinary
> > ROW FORMAT SERDE
> > "org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe"
> > STORED AS RCFile AS
> > SELECT * from lineitem;
> >
> > CREATE TABLE lineitem_rcfile_lazy
> > ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
> > STORED AS RCFile AS
> > SELECT * from lineitem;
> >
> > Since serialization of LazyBinaryColumnarSerDe is binary-based and that
> > of ColumnarSerDe is text-based, I expect to see
> > table lineitem_rcfile_lazybinary is smaller than lineitem_rcfile_lazy.
> > However, no matter whether compression is
> > enabled, lineitem_rcfile_lazybinary is little bit larger
> > than lineitem_rcfile_lazy. Did I use LazyBinaryColumnarSerDe in a wrong
> way?
> >
> > btw, the row group size of RCFile is 32MB.
> >
> > Thanks,
> >
> > Yin
>

Mime
View raw message