hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Speeding up Scans
Date Wed, 25 Jan 2012 13:13:10 GMT
I'm confused...
You mention that you are hashing your key, and you want to do a scan w a start and stop value?

Could you elaborate?

With respect to hashing, if you use a SHA-1 hash, your values will be unique.
(you talked about rehashing ...)

Sent from my iPhone

On Jan 25, 2012, at 7:56 AM, "Peter Wolf" <opus111@gmail.com> wrote:

> Hello all,
> 
> I am looking for advice on speeding up my Scanning.
> 
> I want to iterate over all rows where a particular column (language) equals a particular
value ("JA").
> 
> I am already creating my row keys using that column in the first bytes.  And I do my
scans using partial row matching, like this...
> 
>    public static byte[] calculateStartRowKey(String language) {
>        int languageHash = language.length() > 0 ? language.hashCode() : 0;
>        byte[] language2 = Bytes.toBytes(languageHash);
>        byte[] accountID2 = Bytes.toBytes(0);
>        byte[] timestamp2 = Bytes.toBytes(0);
>        return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
>    }
> 
>    public static byte[] calculateEndRowKey(String language) {
>        int languageHash = language.length() > 0 ? language.hashCode() : 0;
>        byte[] language2 = Bytes.toBytes(languageHash + 1);
>        byte[] accountID2 = Bytes.toBytes(0);
>        byte[] timestamp2 = Bytes.toBytes(0);
>        return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
>    }
> 
>    Scan scan = new Scan(calculateStartRowKey(language), calculateEndRowKey(language));
> 
> 
> Since I am using a hash value for the string, I need to re-check the column to make sure
that some other string does not get the same hash value
> 
>    Filter filter = new SingleColumnValueFilter(resultFamily, languageCol, CompareFilter.CompareOp.EQUAL,
Bytes.toBytes(language));
>    scan.setFilter(filter);
> 
> I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on EC2.
> 
> I think that this should be really fast, but it is not.  Any advice on how to debug/speed
it up?
> 
> Thanks
> Peter
> 
> 
> 
> 

Mime
View raw message