hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukáš Drbal <lukas.dr...@gmail.com>
Subject Re: Rowkey design and presplit table
Date Mon, 04 Mar 2013 21:27:30 GMT
Hi Ted,

thanks alot for this. It's exactly what i need.

Lukas

2013/3/4 Ted Yu <yuzhihong@gmail.com>

> What HBase version are you planning to use ?
>
> In 0.94, you can refer to:
>
> src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
>
> You can write a policy which splits along category boundaries.
>
> There're other split policies in case you're interested:
>
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
>
> ./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
>
> Cheers
>
> On Mon, Mar 4, 2013 at 12:55 PM, Lukáš Drbal <lukas.drbal@gmail.com>
> wrote:
>
> > Hi Jilal,
> > thanks for response, but can you give me please any link or explain it
> > more?
> > I don't know what you mean with regular expression spliting. My data are
> > not fixed and will grow in time.
> >
> > Thanks.
> >
> > Regards
> >
> > Lukas Drbal
> >
> >
> > 2013/3/4 Jilal Oussama <jilal.oussama@gmail.com>
> >
> > > You can split in your application using a regular expression on the
> > > underscore char if the langage supports them (like spliting data of a
> csv
> > > file)
> > >
> > >
> > > 2013/3/4 Lukáš Drbal <lukas.drbal@gmail.com>
> > >
> > > > Hi,
> > > >
> > > > i have one question about rowkey design and presplit table.
> > > >
> > > > My usecase:
> > > > I need store a lot of comments where each comment are for one article
> > and
> > > > this article has one category.
> > > >
> > > > What i need:
> > > > 1) read one comment by id (where i know commentId, articleId and
> > > > categoryId)
> > > > 2) read all coments for article (i know categoryId and articleId)
> > > > 3) read all comments for category (i know categoryId)
> > > >
> > > > From this read pattern i see one good rowkey:
> > > > <categoryId>_<articleId>_<commentId>
> > > >
> > > > But here i don't have fixed size of rowkey, so i don't know how to
> > define
> > > > split pattern. How can be this solved?
> > > > This id's come from external system and grow very fast, so add some
> > like
> > > > "padding" for each part are hard.
> > > >
> > > > Maybe i can use hash function for each part
> > > > md5(<categoryId>_md5(<articleId>)_md5(<commentId>),
but this rowkey
> is
> > > very
> > > > long (3*32+2 bytes), i don't have experience with this long rowkeys.
> > > >
> > > > Can someone give me a suggestions please?
> > > >
> > > > Regards
> > > >
> > > > Lukas Drbal
> > > >
> > >
> >
> >
> >
> > --
> > Save The World - http://www.worldcommunitygrid.org/
> > http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR
> >
> > LesTR
> >
>



-- 
Save The World - http://www.worldcommunitygrid.org/
http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR

LesTR

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message