hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Hbase RowKey design schema
Date Thu, 29 Aug 2013 14:18:48 GMT
What advantage you will be gaining by compressing? Less space? But then it
will add compression/decompression performance overhead. A trade-off but a
especially significant as space is cheap and redundancy is OK with such
data stores.

Having said that, more importantly, what are your read use-cases or access
patterns? That should drive your decision about row key design.


On Thu, Aug 29, 2013 at 5:21 AM, Wasim Karani <wasim@userworkstech.com>wrote:

> I am using HBase to store webtable content like how google is using
> bigtable.
> For reference of google bigtable
> My question is on RowKey, how we should be forming it.
> What google is doing is saving the URL in a reverse order as you can see in
> the PDF document "com.cnn.www" so that all the links associated with
> cnn.com
> will be manages in same block of GFS which will be lot easier to scan.
> I can use the same thing as google is using but wont it will be cool if I
> use
> some algorithm to compress the url
> For eg.
> RewKey                               |  Google Bigtable
> |  Algorithm output
> www.cnn.com/index.php                |  com.cnn.www/index.php
> |  12as/435
> www.cnn.com/news/business/index.html |
>  com.cnn.www/news/business/index.html
> |  12as/2as/dcx/asd
> www.cnn.com/news/sports/index.html   |  com.cnn.www/news/sports/index.html
> |  12as/2as/eds/scf
> Reason behind doing this is rowkey will be shorter as per the Hbase design
> schema (Mentioned in topic Rowkey Length).
> So what do I need from you guys is to know am I correct over here....
> Also if I am correct what Algorithm I should using. I am using python over
> thrift as a programming language so code will be overwhelming for me...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message