hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shourabh rawat <mirage1...@gmail.com>
Subject Re: Improving hbase read performance
Date Wed, 18 Feb 2009 15:52:57 GMT
here's wat i m doin...

this is my get function
it should retrieve entities in parallel by creating parallel threads
for each get.

public String[] get(String tableName,String[] entityIDS){
            ExecutorService threadExecutor = Executors.newFixedThreadPool(50);
            String[] contents = new String[entityIDS.length];
            long initime=System.currentTimeMillis();
            int i = 0;
            while (i < entityIDS.length) {
                threadExecutor.execute(new ReadThread(conf,tableName,
contents, entityIDS[i], i));
            return contents;

and here's the thread

    public void run() {
        long ab=System.currentTimeMillis();
        try {
             Cell c=table.get(entityID, "content:");
             String content=new String(c.getValue());
            if(content==null) j[index]="NULL";
            else {
        } catch (IOException ex) {
null, ex);
        System.out.println(System.currentTimeMillis()-ab + " " + "time
taken to complete for " + "process " + index);

i m creating new htable instance for each such thread
Is this way correct.....would i get a better performance from this.
will my get queries be executed in parallel by the hbase

On Wed, Feb 18, 2009 at 11:27 AM, shourabh rawat <mirage1987@gmail.com> wrote:
> does the number of regionservers affect this performance??
> On Wed, Feb 18, 2009 at 11:23 AM, shourabh rawat <mirage1987@gmail.com> wrote:
>> hey,
>> "> What do you mean by the above when you say read sequentially? Are you
>>> scanning? (Getting a scanner and then nexting through your hbase table?)."
>> well lets say i have 10 keys that are stored in hbase
>> i want to retrive them
>> If I do the reads one by one the time would be summation of  'get'
>> times of each key
>> Could i do the same thing in parallel. so that all the get's cld occur
>> concurrently so i would get total time as the max of the time taken by
>> any of these keys rather than the summ of individual times
>> "
>>> You will have to wait for hbase 0.20.0 or do as Erik suggests and put a
>>> cache in front of hbase.  What are you trying to do with hbase?  Serve a
>>> website? "
>> ya sort of but i want to check performance withought the use of cache
>> (random reads) ....can i get such performance in the range of 10 ms
>> with hbase
>>> Yeah, the RPC keeps a single connection per remote server but channel is
>>> shared by request and receive.  Testing in past, the more remote servers,
>>> the better, but even if a few only, concurrent HTables got better throughput
>>> than one running requests in series (the single connection is not fully
>>> occupied by requests and responses).
>> so by a single connection u mean all the gets wld be treated
>> sequentially (one by one) by the hbase even wen the requests come in
>> parallel(even wen different htable instances for the same table are
>> employed)....is there any way i can make it parallel.....
>> The hbase master has one port that it specifies and other is the port
>> for the hdfs (hadoop)....what can be done to increase the number of
>> connection as u said.......
>> Thanx for yr help.

View raw message