hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: Storing JSON in HBase value cell, which serialization format is most compact?
Date Fri, 14 Nov 2014 01:38:22 GMT
Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix?

Jianshi

On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Thanks Ted.
>
> I think the fix you mentioned is this one HBASE-12078
> <https://issues.apache.org/jira/browse/HBASE-12078>.
>
> Not sure when our Hadoop admin would upgrade it, ahhh....
>
> Jianshi
>
> On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Keep in mind that Prefix Tree encoding has higher overhead in write path
>> compared to other data block encoding methods.
>>
>> Please use 0.98.7 which has the latest fixes for Prefix Tree encoding.
>>
>> Cheers
>>
>> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <jianshi.huang@gmail.com>
>> wrote:
>>
>> > Thanks Ram,
>> >
>> > How about Prefix Tree based encoding then? HBASE-4676
>> > <https://issues.apache.org/jira/browse/HBASE-4676> says it's also
>> possible
>> > to do suffix tries? Then it could be a nice fit for JSON String (or any
>> > long value where changes are small).
>> >
>> > Maybe I should just flatten JSON to columns, hmm...what's the overhead
>> for
>> > a column?
>> >
>> > Jianshi
>> >
>> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> >
>> > > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF
for
>> > > value
>> > > cell?
>> > > No that is not possible now. All the encoding is per KV only.
>> > > But what you say is definitely worth trying.
>> > >
>> > > >>So would you recommend storing JSON flattened as many columns?
>> > > May be yes.  But I have practically not used JSON formats so I may
>> not be
>> > > the best person to comment on this.
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <
>> jianshi.huang@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Ram,
>> > > >
>> > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF for
>> > > value
>> > > > cell?
>> > > >
>> > > > So would you recommend storing JSON flattened as many columns?
>> > > >
>> > > > Jianshi
>> > > >
>> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan <
>> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > >
>> > > > > Hi
>> > > > >
>> > > > > >> Since I'm storing
>> > > > > historical data (snapshot data) and changes between adjacent
value
>> > > cells
>> > > > > are relatively small.
>> > > > >
>> > > > > If the values are changing even if it is smaller the FASTDIFF
will
>> > > > rewrite
>> > > > > the value part.  Only if there are exact matches then it would
>> skip
>> > the
>> > > > > value part. JFYI.
>> > > > >
>> > > > > Regards
>> > > > > Ram
>> > > > >
>> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang <
>> > > jianshi.huang@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > I thought FASTDIFF was only for rowkey and columns, great
if it
>> > also
>> > > > > works
>> > > > > > in value cell.
>> > > > > >
>> > > > > > And thanks for the bjson link!
>> > > > > >
>> > > > > > Jianshi
>> > > > > >
>> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <yuzhihong@gmail.com>
>> > wrote:
>> > > > > >
>> > > > > > > There is FASTDIFF data block encoding.
>> > > > > > >
>> > > > > > > See also http://bjson.org/
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang <
>> > > jianshi.huang@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I'm currently saving JSON in pure String format
in the value
>> > cell
>> > > > and
>> > > > > > > > depends on HBase' block compression to reduce
the overhead
>> of
>> > > JSON.
>> > > > > > > >
>> > > > > > > > I'm wondering if there's a more space efficient
way to store
>> > > JSON?
>> > > > > > > > (there're lots of 0s and 1s, JSON String actually
is an OK
>> > > format)
>> > > > > > > >
>> > > > > > > > I want to keep the value as a Map since the schema
of source
>> > data
>> > > > > might
>> > > > > > > > change over time.
>> > > > > > > >
>> > > > > > > > Also is there a DIFF based encoding for values?
Since I'm
>> > storing
>> > > > > > > > historical data (snapshot data) and changes between
adjacent
>> > > value
>> > > > > > cells
>> > > > > > > > are relatively small.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > --
>> > > > > > > > Jianshi Huang
>> > > > > > > >
>> > > > > > > > LinkedIn: jianshi
>> > > > > > > > Twitter: @jshuang
>> > > > > > > > Github & Blog: http://huangjs.github.com/
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Jianshi Huang
>> > > > > >
>> > > > > > LinkedIn: jianshi
>> > > > > > Twitter: @jshuang
>> > > > > > Github & Blog: http://huangjs.github.com/
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Jianshi Huang
>> > > >
>> > > > LinkedIn: jianshi
>> > > > Twitter: @jshuang
>> > > > Github & Blog: http://huangjs.github.com/
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Jianshi Huang
>> >
>> > LinkedIn: jianshi
>> > Twitter: @jshuang
>> > Github & Blog: http://huangjs.github.com/
>> >
>>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message