hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Storing JSON in HBase value cell, which serialization format is most compact?
Date Fri, 14 Nov 2014 02:27:48 GMT
You can use HBase from HDP 2.2 on hdfs 2.5

If you have further question, let's take it offline.

Cheers

On Thu, Nov 13, 2014 at 6:12 PM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> But HDP 2.2 uses HDFS 2.6.0... very hard to convince our admins to upgrade.
>
> Would you recommend us to upgrade to 2.6.0? I'll ask them to consult HWX if
> you say yes. :)
>
> Jianshi
>
> On Fri, Nov 14, 2014 at 9:42 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > No.
> > The upcoming HDP 2.2 does have that fix.
> >
> > Cheers
> >
> > On Thu, Nov 13, 2014 at 5:38 PM, Jianshi Huang <jianshi.huang@gmail.com>
> > wrote:
> >
> > > Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix?
> > >
> > > Jianshi
> > >
> > > On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang <
> jianshi.huang@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ted.
> > > >
> > > > I think the fix you mentioned is this one HBASE-12078
> > > > <https://issues.apache.org/jira/browse/HBASE-12078>.
> > > >
> > > > Not sure when our Hadoop admin would upgrade it, ahhh....
> > > >
> > > > Jianshi
> > > >
> > > > On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > >
> > > >> Keep in mind that Prefix Tree encoding has higher overhead in write
> > path
> > > >> compared to other data block encoding methods.
> > > >>
> > > >> Please use 0.98.7 which has the latest fixes for Prefix Tree
> encoding.
> > > >>
> > > >> Cheers
> > > >>
> > > >> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <
> > jianshi.huang@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > Thanks Ram,
> > > >> >
> > > >> > How about Prefix Tree based encoding then? HBASE-4676
> > > >> > <https://issues.apache.org/jira/browse/HBASE-4676> says
it's also
> > > >> possible
> > > >> > to do suffix tries? Then it could be a nice fit for JSON String
> (or
> > > any
> > > >> > long value where changes are small).
> > > >> >
> > > >> > Maybe I should just flatten JSON to columns, hmm...what's the
> > overhead
> > > >> for
> > > >> > a column?
> > > >> >
> > > >> > Jianshi
> > > >> >
> > > >> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
> > > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >> >
> > > >> > > >>So is it possible to specify FASTDIFF for rowkey/column
and
> DIFF
> > > for
> > > >> > > value
> > > >> > > cell?
> > > >> > > No that is not possible now. All the encoding is per KV
only.
> > > >> > > But what you say is definitely worth trying.
> > > >> > >
> > > >> > > >>So would you recommend storing JSON flattened as
many columns?
> > > >> > > May be yes.  But I have practically not used JSON formats
so I
> may
> > > >> not be
> > > >> > > the best person to comment on this.
> > > >> > >
> > > >> > > Regards
> > > >> > > Ram
> > > >> > >
> > > >> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <
> > > >> jianshi.huang@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Thanks Ram,
> > > >> > > >
> > > >> > > > So is it possible to specify FASTDIFF for rowkey/column
and
> DIFF
> > > for
> > > >> > > value
> > > >> > > > cell?
> > > >> > > >
> > > >> > > > So would you recommend storing JSON flattened as many
columns?
> > > >> > > >
> > > >> > > > Jianshi
> > > >> > > >
> > > >> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan
<
> > > >> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >> > > >
> > > >> > > > > Hi
> > > >> > > > >
> > > >> > > > > >> Since I'm storing
> > > >> > > > > historical data (snapshot data) and changes between
adjacent
> > > value
> > > >> > > cells
> > > >> > > > > are relatively small.
> > > >> > > > >
> > > >> > > > > If the values are changing even if it is smaller
the
> FASTDIFF
> > > will
> > > >> > > > rewrite
> > > >> > > > > the value part.  Only if there are exact matches
then it
> would
> > > >> skip
> > > >> > the
> > > >> > > > > value part. JFYI.
> > > >> > > > >
> > > >> > > > > Regards
> > > >> > > > > Ram
> > > >> > > > >
> > > >> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang
<
> > > >> > > jianshi.huang@gmail.com
> > > >> > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > I thought FASTDIFF was only for rowkey and
columns, great
> if
> > > it
> > > >> > also
> > > >> > > > > works
> > > >> > > > > > in value cell.
> > > >> > > > > >
> > > >> > > > > > And thanks for the bjson link!
> > > >> > > > > >
> > > >> > > > > > Jianshi
> > > >> > > > > >
> > > >> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <
> > yuzhihong@gmail.com>
> > > >> > wrote:
> > > >> > > > > >
> > > >> > > > > > > There is FASTDIFF data block encoding.
> > > >> > > > > > >
> > > >> > > > > > > See also http://bjson.org/
> > > >> > > > > > >
> > > >> > > > > > > Cheers
> > > >> > > > > > >
> > > >> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi
Huang <
> > > >> > > jianshi.huang@gmail.com>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hi,
> > > >> > > > > > > >
> > > >> > > > > > > > I'm currently saving JSON in pure
String format in the
> > > value
> > > >> > cell
> > > >> > > > and
> > > >> > > > > > > > depends on HBase' block compression
to reduce the
> > overhead
> > > >> of
> > > >> > > JSON.
> > > >> > > > > > > >
> > > >> > > > > > > > I'm wondering if there's a more
space efficient way to
> > > store
> > > >> > > JSON?
> > > >> > > > > > > > (there're lots of 0s and 1s, JSON
String actually is
> an
> > OK
> > > >> > > format)
> > > >> > > > > > > >
> > > >> > > > > > > > I want to keep the value as a Map
since the schema of
> > > source
> > > >> > data
> > > >> > > > > might
> > > >> > > > > > > > change over time.
> > > >> > > > > > > >
> > > >> > > > > > > > Also is there a DIFF based encoding
for values? Since
> > I'm
> > > >> > storing
> > > >> > > > > > > > historical data (snapshot data)
and changes between
> > > adjacent
> > > >> > > value
> > > >> > > > > > cells
> > > >> > > > > > > > are relatively small.
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > --
> > > >> > > > > > > > Jianshi Huang
> > > >> > > > > > > >
> > > >> > > > > > > > LinkedIn: jianshi
> > > >> > > > > > > > Twitter: @jshuang
> > > >> > > > > > > > Github & Blog: http://huangjs.github.com/
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Jianshi Huang
> > > >> > > > > >
> > > >> > > > > > LinkedIn: jianshi
> > > >> > > > > > Twitter: @jshuang
> > > >> > > > > > Github & Blog: http://huangjs.github.com/
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > Jianshi Huang
> > > >> > > >
> > > >> > > > LinkedIn: jianshi
> > > >> > > > Twitter: @jshuang
> > > >> > > > Github & Blog: http://huangjs.github.com/
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Jianshi Huang
> > > >> >
> > > >> > LinkedIn: jianshi
> > > >> > Twitter: @jshuang
> > > >> > Github & Blog: http://huangjs.github.com/
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message