hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Scan vs Put vs Get
Date Thu, 28 Jun 2012 16:49:27 GMT
Oh, sorry. You're right. You already said that and I forgot to update
it. It's working fine when I add this parameter. And as you are
saying, I can get the respons time I want by playing with the
chance...

I get (34758 lines/seconds) with 0.99 as the chance, and only (7564
lines/seconds) with 0.09... But that's still better than the gets.

I just retried the gets, to see if the performances are changing after
many table access, but results are still almost the same.

I also tried to read 100 000 rows in a row with a random start key,
and the performances are close to the random filter. (35273
lines/seconds). So it's really the get which is giving me an
headache...

2012/6/28, N Keywal <nkeywal@gmail.com>:
> For the filter list my guess is that you're filtering out all rows
> because RandomRowFilter#chance is not initialized (it should be
> something like RandomRowFilter rrf = new RandomRowFilter(0.5);)
> But note that this test will never be comparable to the test with a
> list of gets. You can make it as slow/fast as you want by playing with
> the 'chance' parameter.
>
> The results with gets and bloom filter are also in the interesting
> category, hopefully an expert will get in the loop...
>
>
>
> On Thu, Jun 28, 2012 at 6:04 PM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org> wrote:
>> Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I
>> mean, bad I did not figured that. Thanks for pointing that. That
>> definitively explain the difference in the performances.
>>
>> I have activated the bloomfilters with this code:
>> HBaseAdmin admin = new HBaseAdmin(config);
>> HTable table = new HTable(config, "test3");
>> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
>> HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0];
>> cd.setBloomFilterType(BloomType.ROW);
>> admin.disableTable("test3");
>> admin.modifyColumn("test3", cd);
>> admin.enableTable("test3");
>> System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
>>
>> And here is the result for the first attempt (using gets):
>> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
>> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
>> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
>> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
>> 'true', BLOCKCACHE => 'true'}
>> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
>> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
>> MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
>> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
>> 'true', BLOCKCACHE => 'true'}
>> Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0...
>> Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds)
>>
>> 2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds)
>> 3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds)
>> After few more iterations (about 30), I'm between 200 and 250
>> lines/seconds, like before.
>>
>> Regarding the filterList, I tried, but now I'm getting this error from
>> the servers:
>> org.apache.hadoop.hbase.regionserver.LeaseException:
>> org.apache.hadoop.hbase.regionserver.LeaseException: lease
>> '-6376193724680783311' does not exist
>> Here is the code:
>>        final int linesToRead = 10000;
>>        System.out.println(new java.util.Date () + " Processing iteration "
>> +
>> iteration + "... ");
>>        RandomRowFilter rrf = new RandomRowFilter();
>>        KeyOnlyFilter kof = new KeyOnlyFilter();
>>        Scan scan = new Scan();
>>        List<Filter> filters = new ArrayList<Filter>();
>>        filters.add(rrf);
>>        filters.add(kof);
>>        FilterList filterList = new FilterList(filters);
>>        scan.setFilter(filterList);
>>        scan.setBatch(Math.min(linesToRead, 1000));
>>        scan.setCaching(Math.min(linesToRead, 1000));
>>        ResultScanner scanner = table.getScanner(scan);
>>        processed = 0;
>>        long timeBefore = System.currentTimeMillis();
>>        for (Result result : scanner.next(linesToRead))
>>        {
>>                System.out.println("Result: " + result); //
>>                if (result != null)
>>                        processed++;
>>        }
>>        scanner.close();
>>
>> It's failing when I try to do for (Result result :
>> scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and
>> 1 with the same result :(
>>
>> I will try to find the root cause, but if you have any hint, it's
>> welcome.
>>
>> JM
>

Mime
View raw message