hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N Keywal <nkey...@gmail.com>
Subject Re: Scan vs Put vs Get
Date Thu, 28 Jun 2012 08:30:17 GMT
Hi Jean-Marc,

Interesting.... :-)

Added to Anoop questions:

What's the hbase version you're using?

Is it repeatable, I mean if you try twice the same "gets" with the
same client do you have the same results? I'm asking because the
client caches the locations.

If the locations are wrong (region moved) you will have a retry loop,
and it includes a sleep. Do you have anything in the logs?

Could you share as well the code you're using to get the ~100 ms time?

Cheers,

N.

On Thu, Jun 28, 2012 at 6:56 AM, Anoop Sam John <anoopsj@huawei.com> wrote:
> Hi
>     How many Gets you batch together in one call? Is this equal to the Scan#setCaching
() that u are using?
> If both are same u can be sure that the the number of NW calls is coming almost same.
>
> Also you are giving random keys in the Gets. The scan will be always sequential. Seems
in your get scenario it is very very random reads resulting in too many reads of HFile block
from HDFS. [Block caching is enabled?]
>
> Also have you tried using Bloom filters?  ROW blooms might improve your get performance.
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Thursday, June 28, 2012 5:04 AM
> To: user
> Subject: Scan vs Put vs Get
>
> Hi,
>
> I have a small piece of code, for testing, which is putting 1B lines
> in an existing table, getting 3000 lines and scanning 10000.
>
> The table is one family, one column.
>
> Everything is done randomly. Put with Random key (24 bytes), fixed
> family and fixed column names with random content (24 bytes).
>
> Get (batch) is done with random keys and scan with RandomRowFilter.
>
> And here are the results.
> Time to insert 1000000 lines: 43 seconds (23255 lines/seconds)
> That's correct for my needs based on the poor performances of the
> servers in the cluster. I'm fine with the results.
>
> Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds)
> This is way to low. I don't understand why. So I tried the random scan
> because I'm not able to figure the issue.
>
> Time to read 10000 lines: 108.0 mseconds (92593 lines/seconds)
> This it impressive! I have added that after I failed with the get. I
> moved from 262 lines per seconds to almost 100K lines/seconds!!! It's
> awesome!
>
> However, I'm still wondering what's wrong with my gets.
>
> The code is very simple. I'm using Get objects that I'm executing in a
> Batch. I tried to add a filter but it's not helping. Here is an
> extract of the code.
>
>                        for (long l = 0; l < linesToRead; l++)
>                        {
>                                byte[] array1 = new byte[24];
>                                for (int i = 0; i < array1.length;
i++)
>                                                array1[i] = (byte)Math.floor(Math.random()
* 256);
>                                Get g = new Get (array1);
>                                gets.addElement(g);
>                        }
>                                Object[] results = new Object[gets.size()];
>                                System.out.println(new java.util.Date
() + " \"gets\" created.");
>                                long timeBefore = System.currentTimeMillis();
>                        table.batch(gets, results);
>                        long timeAfter = System.currentTimeMillis();
>
>                        float duration = timeAfter - timeBefore;
>                        System.out.println ("Time to read " + gets.size()
+ " lines : "
> + duration + " mseconds (" + Math.round(((float)linesToRead /
> (duration / 1000))) + " lines/seconds)");
>
> What's wrong with it? I can't add the setBatch neither I can add
> setCaching because it's not a scan. I tried with different numbers of
> gets but it's almost always the same speed. Am I using it the wrong
> way? Does anyone have any advice to improve that?
>
> Thanks,
>
> JM

Mime
View raw message