hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: What is the best hbase table schema for following json data?
Date Thu, 30 May 2013 06:13:24 GMT
Hi Ted,

@You can utilize MultipleColumnPrefixFilter or ColumnPrefixFilter to speed
up scan.
[Anil] Thanks for the info. But I am storing all the key value pairs
corresponding to one click in one column. Still these ColumnPrefixFilter
will work in this case?

@How many key / value pairs does each 'click' have ?
[Anil] number of key value pairs are not fixed. It can vary from 20-200

@Among these pairs, are you going to search for a subset of keys ?
[Anil] Yes.



In my schema, I am storing each click(set of key value pairs) in one cell
say "clicks:event1". Is this OK? or do I need to change schema design in
such a way that each key-value pair as one column? What is the better way
to store Json data?


Thanks,
B Anil Kumar.


On Thu, May 30, 2013 at 9:42 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. 1) Suppose If I want search on key of click, It will be full scan
>
> You can utilize MultipleColumnPrefixFilter or ColumnPrefixFilter to speed
> up scan.
>
> How many key / value pairs does each 'click' have ? Among these pairs, are
> you going to search for a subset of keys ?
>
> Cheers
>
> On Wed, May 29, 2013 at 8:47 PM, AnilKumar B <akumarb2010@gmail.com>
> wrote:
>
> > Hi,
> >
> > What is the best hbase table schema for following json data?
> > I need to store following JSON data in hbase.
> > {"Session"":{"Header" :
> > {"key1":"value1","key2":"value2","key3":"value3","key4":"value4",....},
> > "clicks" : [{"click" " : {"key1":"value1","key2":"value2",
> > "key3":"value3"....}, {"click" : {"key1":"value1", "key2":"value2",
> > ....}}]}}
> >
> > I have created the schema as below, but there seems to some issues.
> > rowkey -> compositeKey of session fields
> > ColumnFamily 1 -> "Header" which consists of following columns
> > 1) Header:HeaderFields which stores  "{"Header" :
> > {"key1":"value1","key1":"value1","key1":"value1","key1":"value1",....}"
> in
> > one cell
> > 2) other columns
> >
> > ColumnFamily 2 -> "clicks" and each "click" will be one column
> >
> > The problem here is
> > 1) Suppose If I want search on key of click, It will be full scan, how
> can
> > I optimize my schema for such search requirement?
> > 2) If I want to provide some secondary index for keys of clicks, how can
> > Implement it?
> >
> > Thanks,
> > B Anil Kumar.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message