hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <Ankur.G...@corp.aol.com>
Subject Row-key in HBase
Date Mon, 28 Apr 2008 11:01:59 GMT
Hi folks,
           I am using HBase table to store my crawled data and using the
MD5 signature of the canonicalized URL as a row key in HBase. The
bigtable paper suggest using keys appropriately so that URLs from the
same domain are stored close to each other and domain analysis can be
carried out efficiently.
So for e.g. storing page maps.google.com/index.html should use row-key
com.google.maps/index.html.

My question is will using MD5 signature of canonicalized URL hurt data
locality of URLs from same domains ?

Thanks
-Ankur

Mime
View raw message