Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4BEAE878.3000807@ugame.net.pl>
Date: Wed, 12 May 2010 19:42:16 +0200
From: Sebastian Bauer <admin@ugame.net.pl>
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US;
 rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: hbase-user@hadoop.apache.org
Subject: Re: Problem with performance with many columns in column familie
References: <4BE97D12.1010702@ugame.net.pl>
	 <AANLkTimswWDrNq9tBiGYjq1uHyRyiL77pwrrshXp_gVp@mail.gmail.com>
	 <4BE98AD3.1000404@ugame.net.pl>
 <AANLkTimhaONgaNWQ0HgewY7f7RbKW9mlgU8ppLfgs_sA@mail.gmail.com>
 <4BEAC667.6080203@ugame.net.pl>
In-Reply-To: <4BEAC667.6080203@ugame.net.pl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

path has stupid bug with double lock...

Index: core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
===================================================================
--- core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(wersja 942215)
+++ core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
(kopia robocza)
@@ -1449,6 +1449,14 @@

        // Run a GET scan and put results into the specified list
        scanner.get(result);
+
+      this.memstore.readLockLock();
+      if (!result.isEmpty()) {
+          KeyValue kv = result.get(0);
+          this.memstore.add(kv);
+        }
+      this.memstore.readLockUnlock();
+
      } finally {
        this.lock.readLock().unlock();
      }

W dniu 12.05.2010 17:16, Sebastian Bauer pisze:
> I figured out what is taking so long, test data was 1 row with 100000 
> columns and 1 with 100
>
> when i try to increament column this huge row data didnt land in 
> MemStore and times was(test in python after warmup):
>
> before path:
> #get one column from big row
> 1 0:00:00.919464
> #get one column from small row
> 2 0:00:00.009650
> #atomicIncrement one column from big row
> 3 0:00:00.081196
> #atomicIncrement one column from small row
> 4 0:00:00.006530
>
> after path:
> #get one column from big row
> 1 0:00:00.009909
> #get one column from small row
> 2 0:00:00.003489
> #atomicIncrement one column from big row
> 3 0:00:00.004890
> #atomicIncrement one column from small row
> 4 0:00:00.004820
>
>
> path:
>
> Index: core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
> ===================================================================
> --- 
> core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
> (wersja 942215)
> +++ 
> core/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java  
> (kopia robocza)
> @@ -1449,6 +1449,14 @@
>
>        // Run a GET scan and put results into the specified list
>        scanner.get(result);
> +
> +      this.memstore.readLockLock();
> +      if (!result.isEmpty()) {
> +          KeyValue kv = result.get(0);
> +          this.memstore.add(kv);
> +        }
> +      this.memstore.readLockLock();
> +
>      } finally {
>        this.lock.readLock().unlock();
>      }
>
> what do you think about this change?
> all suggestions welcome because i dont even know java ;)
>
>
>
> Sebastian B.
>
> W dniu 11.05.2010 18:58, Ted Yu pisze:
>> jstack is a handy tool:
>> http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html
>>
>> On Tue, May 11, 2010 at 9:50 AM, Sebastian Bauer<admin@ugame.net.pl>  
>> wrote:
>>
>>> Ram is not a problem, second region server using about 550mB and first
>>> about 300mB problem is with CPU, when i making queries to both column
>>> famielies second region server is using ablut 40% - 80% first about 
>>> 10%,
>>> after turning off queries to AdvToUsers(this big) CPU on both 
>>> servers are
>>> 2-7%.
>>>
>>> Sorry but i dont know how to make thread-dumping and i dont know java.
>>>
>>> W dniu 11.05.2010 18:40, Stack pisze:
>>>
>>>> You could try thread-dumping the regionserver to try and figure where
>>>> its hung up.  Counters are usually fast so maybe its something to do
>>>> w/ 8k of them in the one row.  What kinda numbers are you seeing?  How
>>>> much RAM you throwing at the problem?
>>>>
>>>> Yours,
>>>> St.Ack
>>>>
>>>>
>>>>
>>>> On Tue, May 11, 2010 at 8:51 AM, Sebastian Bauer<admin@ugame.net.pl>
>>>>   wrote:
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> maybe i'll get help here :)
>>>>>
>>>>> I have 2 tables, UserToAdv and AdvToUsers.
>>>>>
>>>>> UserToAdv is simple:
>>>>> { "row_id" =>   [ {"adv:<id>":<counter>   },
>>>>>                             {"adv:<id>":<counter>   },
>>>>>                             .....about 100 columns
>>>>>                         ]
>>>>> only one kind of operation is perform - increasing counter:
>>>>> client.atomicIncrement("UsersToAdv", ID, column, 1)
>>>>>
>>>>>
>>>>> AdvToUsers have one column familie: "user:" inside this i have 
>>>>> about 8000
>>>>> columns with format: "user:<cookie>"
>>>>> what i'm doing on DB is increasing counter inside "user:<cookie>":
>>>>>
>>>>> client.atomicIncrement("AdvToUsers", ID, column, 1)
>>>>>
>>>>> i have 2 regions:
>>>>>
>>>>>
>>>>> first one:
>>>>>         UsersToAdv,6FEC716B3960D1E8208DE6B06993A68D,1273580007602
>>>>>             stores=1, storefiles=1, storefileSizeMB=8, 
>>>>> memstoreSizeMB=9,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,0FDD84B9124B98B05A5E40F47C12DC45,1273580531847
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,5735,1273580575873
>>>>>             stores=1, storefiles=1, storefileSizeMB=15, 
>>>>> memstoreSizeMB=10,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,67CB411B48A7B83F0B863AC615285060,1273580533380
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,4012667F3E78C6431E3DD84641002FCE,1273580532995
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,5FE4A7506737CE0F38E254E62E23FE45,1273580533380
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,47E95EE30A11EBE45F055AC57EB2676E,1273580532995
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,37F9573415D9069B7E5810012AAD9CB7,1273580532258
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,1FFFDF082566D93153B34BFE0C44A9BF,1273580532173
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,17C93FB0047BC4D660C6570B734CBE17,1273580531847
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,27DFD8F02CD98FF57E8334837C73C57A,1273580532173
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>
>>>>> second one:
>>>>>         UsersToAdv,57C568066D35D09B4AF6CD7D68681144,1273580533427
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,4FA6A1A2681E2D252CCF765B140369EF,1273580533427
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,,1273580575966
>>>>>             stores=1, storefiles=1, storefileSizeMB=1, 
>>>>> memstoreSizeMB=1,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,07B296AC590061025B382B163E3C149E,1273580533023
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,3015D5DB07E2F4D30A19DEB354A85B52,1273580532258
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,5859,1273580580940
>>>>>             stores=1, storefiles=1, storefileSizeMB=9, 
>>>>> memstoreSizeMB=9,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,5315,1273580575966
>>>>>             stores=1, storefiles=1, storefileSizeMB=14, 
>>>>> memstoreSizeMB=12,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,5825,1273580580940
>>>>>             stores=1, storefiles=1, storefileSizeMB=8, 
>>>>> memstoreSizeMB=8,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,5671,1273580578114
>>>>>             stores=1, storefiles=1, storefileSizeMB=8, 
>>>>> memstoreSizeMB=7,
>>>>> storefileIndexSizeMB=0
>>>>>         UsersToAdv,,1273580533023
>>>>>             stores=1, storefiles=1, storefileSizeMB=4, 
>>>>> memstoreSizeMB=4,
>>>>> storefileIndexSizeMB=0
>>>>>         AdvToUsers,5457,1273580578114
>>>>>             stores=1, storefiles=1, storefileSizeMB=8, 
>>>>> memstoreSizeMB=8,
>>>>> storefileIndexSizeMB=0
>>>>>
>>>>> number of queries on both tables are equal, but load is greater on 
>>>>> second
>>>>> region because of AdvToUsers
>>>>>
>>>>> is there any solution to increase performance atomicIncrement 
>>>>> operation
>>>>> on
>>>>> column families with so many(8000) columns?
>>>>>
>>>>> Thank You,
>>>>>
>>>>> Sebastian Bauer
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
>