hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Speeding up Scans
Date Wed, 25 Jan 2012 13:13:40 GMT

Hi there-

Quick sanity check:  what caching level are you using?  (default is 1)  I
know this is basic, but it's always good to double-check.

If "language" is already in the lead position of the rowkey, why use the
filter?

As for EC2, that's a wildcard.





On 1/25/12 7:56 AM, "Peter Wolf" <opus111@gmail.com> wrote:

>Hello all,
>
>I am looking for advice on speeding up my Scanning.
>
>I want to iterate over all rows where a particular column (language)
>equals a particular value ("JA").
>
>I am already creating my row keys using that column in the first bytes.
>And I do my scans using partial row matching, like this...
>
>     public static byte[] calculateStartRowKey(String language) {
>         int languageHash = language.length() > 0 ? language.hashCode() :
>0;
>         byte[] language2 = Bytes.toBytes(languageHash);
>         byte[] accountID2 = Bytes.toBytes(0);
>         byte[] timestamp2 = Bytes.toBytes(0);
>         return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
>     }
>
>     public static byte[] calculateEndRowKey(String language) {
>         int languageHash = language.length() > 0 ? language.hashCode() :
>0;
>         byte[] language2 = Bytes.toBytes(languageHash + 1);
>         byte[] accountID2 = Bytes.toBytes(0);
>         byte[] timestamp2 = Bytes.toBytes(0);
>         return Bytes.add(Bytes.add(language2, accountID2), timestamp2);
>     }
>
>     Scan scan = new Scan(calculateStartRowKey(language),
>calculateEndRowKey(language));
>
>
>Since I am using a hash value for the string, I need to re-check the
>column to make sure that some other string does not get the same hash
>value
>
>     Filter filter = new SingleColumnValueFilter(resultFamily,
>languageCol, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(language));
>     scan.setFilter(filter);
>
>I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on
>EC2.
>
>I think that this should be really fast, but it is not.  Any advice on
>how to debug/speed it up?
>
>Thanks
>Peter
>
>
>
>
>



Mime
View raw message