hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: Storing JSON in HBase value cell, which serialization format is most compact?
Date Fri, 14 Nov 2014 05:05:03 GMT
Oh, that article, I've read that before. I'm using the approach that using
a single KV to hold all my columns (mostly readonly).

So conclusion: saving in disk space is not that huge

one HBase column per colomn:

1,350,483

1000

SNAPPY

DIFF
vs one HBase column for all columns:

1,119,330

1000

SNAPPY

DIFF
Only about 15%

However, the article suggested that the saving over the network wire is
huge.

6,293,670

1000

NONE

NONE
vs

1,374,465

1000

NONE

NONE


Thanks again for the help!

Jianshi

On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> w.r.t. the effect of data block encoding on HFile size, take a look at Doug
> Meil's blog 'The Effect of ColumnFamily, RowKey and KeyValue Design on
> HFile Size':
> http://blogs.apache.org/hbase/
>
> Cheers
>
> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <jianshi.huang@gmail.com>
> wrote:
>
> > Thanks Ram,
> >
> > How about Prefix Tree based encoding then? HBASE-4676
> > <https://issues.apache.org/jira/browse/HBASE-4676> says it's also
> possible
> > to do suffix tries? Then it could be a nice fit for JSON String (or any
> > long value where changes are small).
> >
> > Maybe I should just flatten JSON to columns, hmm...what's the overhead
> for
> > a column?
> >
> > Jianshi
> >
> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF for
> > > value
> > > cell?
> > > No that is not possible now. All the encoding is per KV only.
> > > But what you say is definitely worth trying.
> > >
> > > >>So would you recommend storing JSON flattened as many columns?
> > > May be yes.  But I have practically not used JSON formats so I may not
> be
> > > the best person to comment on this.
> > >
> > > Regards
> > > Ram
> > >
> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <
> jianshi.huang@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ram,
> > > >
> > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF for
> > > value
> > > > cell?
> > > >
> > > > So would you recommend storing JSON flattened as many columns?
> > > >
> > > > Jianshi
> > > >
> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > >> Since I'm storing
> > > > > historical data (snapshot data) and changes between adjacent value
> > > cells
> > > > > are relatively small.
> > > > >
> > > > > If the values are changing even if it is smaller the FASTDIFF will
> > > > rewrite
> > > > > the value part.  Only if there are exact matches then it would skip
> > the
> > > > > value part. JFYI.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang <
> > > jianshi.huang@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I thought FASTDIFF was only for rowkey and columns, great if
it
> > also
> > > > > works
> > > > > > in value cell.
> > > > > >
> > > > > > And thanks for the bjson link!
> > > > > >
> > > > > > Jianshi
> > > > > >
> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > > >
> > > > > > > There is FASTDIFF data block encoding.
> > > > > > >
> > > > > > > See also http://bjson.org/
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang <
> > > jianshi.huang@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm currently saving JSON in pure String format in
the value
> > cell
> > > > and
> > > > > > > > depends on HBase' block compression to reduce the
overhead of
> > > JSON.
> > > > > > > >
> > > > > > > > I'm wondering if there's a more space efficient way
to store
> > > JSON?
> > > > > > > > (there're lots of 0s and 1s, JSON String actually
is an OK
> > > format)
> > > > > > > >
> > > > > > > > I want to keep the value as a Map since the schema
of source
> > data
> > > > > might
> > > > > > > > change over time.
> > > > > > > >
> > > > > > > > Also is there a DIFF based encoding for values? Since
I'm
> > storing
> > > > > > > > historical data (snapshot data) and changes between
adjacent
> > > value
> > > > > > cells
> > > > > > > > are relatively small.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > --
> > > > > > > > Jianshi Huang
> > > > > > > >
> > > > > > > > LinkedIn: jianshi
> > > > > > > > Twitter: @jshuang
> > > > > > > > Github & Blog: http://huangjs.github.com/
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jianshi Huang
> > > > > >
> > > > > > LinkedIn: jianshi
> > > > > > Twitter: @jshuang
> > > > > > Github & Blog: http://huangjs.github.com/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message