hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Storing JSON in HBase value cell, which serialization format is most compact?
Date Fri, 14 Nov 2014 01:42:36 GMT
No.
The upcoming HDP 2.2 does have that fix.

Cheers

On Thu, Nov 13, 2014 at 5:38 PM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix?
>
> Jianshi
>
> On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang <jianshi.huang@gmail.com>
> wrote:
>
> > Thanks Ted.
> >
> > I think the fix you mentioned is this one HBASE-12078
> > <https://issues.apache.org/jira/browse/HBASE-12078>.
> >
> > Not sure when our Hadoop admin would upgrade it, ahhh....
> >
> > Jianshi
> >
> > On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Keep in mind that Prefix Tree encoding has higher overhead in write path
> >> compared to other data block encoding methods.
> >>
> >> Please use 0.98.7 which has the latest fixes for Prefix Tree encoding.
> >>
> >> Cheers
> >>
> >> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <jianshi.huang@gmail.com
> >
> >> wrote:
> >>
> >> > Thanks Ram,
> >> >
> >> > How about Prefix Tree based encoding then? HBASE-4676
> >> > <https://issues.apache.org/jira/browse/HBASE-4676> says it's also
> >> possible
> >> > to do suffix tries? Then it could be a nice fit for JSON String (or
> any
> >> > long value where changes are small).
> >> >
> >> > Maybe I should just flatten JSON to columns, hmm...what's the overhead
> >> for
> >> > a column?
> >> >
> >> > Jianshi
> >> >
> >> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
> >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> >
> >> > > >>So is it possible to specify FASTDIFF for rowkey/column and
DIFF
> for
> >> > > value
> >> > > cell?
> >> > > No that is not possible now. All the encoding is per KV only.
> >> > > But what you say is definitely worth trying.
> >> > >
> >> > > >>So would you recommend storing JSON flattened as many columns?
> >> > > May be yes.  But I have practically not used JSON formats so I may
> >> not be
> >> > > the best person to comment on this.
> >> > >
> >> > > Regards
> >> > > Ram
> >> > >
> >> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <
> >> jianshi.huang@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Thanks Ram,
> >> > > >
> >> > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF
> for
> >> > > value
> >> > > > cell?
> >> > > >
> >> > > > So would you recommend storing JSON flattened as many columns?
> >> > > >
> >> > > > Jianshi
> >> > > >
> >> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan <
> >> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > >
> >> > > > > Hi
> >> > > > >
> >> > > > > >> Since I'm storing
> >> > > > > historical data (snapshot data) and changes between adjacent
> value
> >> > > cells
> >> > > > > are relatively small.
> >> > > > >
> >> > > > > If the values are changing even if it is smaller the FASTDIFF
> will
> >> > > > rewrite
> >> > > > > the value part.  Only if there are exact matches then it
would
> >> skip
> >> > the
> >> > > > > value part. JFYI.
> >> > > > >
> >> > > > > Regards
> >> > > > > Ram
> >> > > > >
> >> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang <
> >> > > jianshi.huang@gmail.com
> >> > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I thought FASTDIFF was only for rowkey and columns,
great if
> it
> >> > also
> >> > > > > works
> >> > > > > > in value cell.
> >> > > > > >
> >> > > > > > And thanks for the bjson link!
> >> > > > > >
> >> > > > > > Jianshi
> >> > > > > >
> >> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <yuzhihong@gmail.com>
> >> > wrote:
> >> > > > > >
> >> > > > > > > There is FASTDIFF data block encoding.
> >> > > > > > >
> >> > > > > > > See also http://bjson.org/
> >> > > > > > >
> >> > > > > > > Cheers
> >> > > > > > >
> >> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang <
> >> > > jianshi.huang@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi,
> >> > > > > > > >
> >> > > > > > > > I'm currently saving JSON in pure String
format in the
> value
> >> > cell
> >> > > > and
> >> > > > > > > > depends on HBase' block compression to reduce
the overhead
> >> of
> >> > > JSON.
> >> > > > > > > >
> >> > > > > > > > I'm wondering if there's a more space efficient
way to
> store
> >> > > JSON?
> >> > > > > > > > (there're lots of 0s and 1s, JSON String
actually is an OK
> >> > > format)
> >> > > > > > > >
> >> > > > > > > > I want to keep the value as a Map since the
schema of
> source
> >> > data
> >> > > > > might
> >> > > > > > > > change over time.
> >> > > > > > > >
> >> > > > > > > > Also is there a DIFF based encoding for values?
Since I'm
> >> > storing
> >> > > > > > > > historical data (snapshot data) and changes
between
> adjacent
> >> > > value
> >> > > > > > cells
> >> > > > > > > > are relatively small.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Thanks,
> >> > > > > > > > --
> >> > > > > > > > Jianshi Huang
> >> > > > > > > >
> >> > > > > > > > LinkedIn: jianshi
> >> > > > > > > > Twitter: @jshuang
> >> > > > > > > > Github & Blog: http://huangjs.github.com/
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Jianshi Huang
> >> > > > > >
> >> > > > > > LinkedIn: jianshi
> >> > > > > > Twitter: @jshuang
> >> > > > > > Github & Blog: http://huangjs.github.com/
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Jianshi Huang
> >> > > >
> >> > > > LinkedIn: jianshi
> >> > > > Twitter: @jshuang
> >> > > > Github & Blog: http://huangjs.github.com/
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Jianshi Huang
> >> >
> >> > LinkedIn: jianshi
> >> > Twitter: @jshuang
> >> > Github & Blog: http://huangjs.github.com/
> >> >
> >>
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message