hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From llpind <sonny_h...@hotmail.com>
Subject Re: Adding/Removing regionservers
Date Wed, 08 Jul 2009 16:54:22 GMT

Okay will do.  I'm new to scanners & regions still.

HTable has a getRegionInfo() method which returns Map<HRegionInfo,
HServerAddress> .  I can iterate over this and spawn scanners per region
given a start/stop.  I'm a bit confused how I put my start/stop rows in
then?  In my loops I have a start/stop row as well.  Basically how will I
combine the result from all the Threads with my row filters & region
start/stop row keys.

Could you please explain how to go about this?  

Thanks.


Jonathan Gray-2 wrote:
> 
> Yes, you could multi-thread your scanners.
> 
> You could query for region information to get the start/stop rows for 
> the regions in the table, and then spin up a scanner in each thread for 
> each region.
> 
> If you plan on doing anything like that, keep me / the list in the loop, 
>   would be willing to help out.
> 
> JG
> 
> llpind wrote:
>> Thanks for the link, that sounds good.
>> 
>> If I multi-thread scanners will HBase performance speed up as more boxes
>> are
>> added?
>> 
>> for example in the above example I had:
>> 
>> for (String typeVal : list){ 
>> 
>>   Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
>> Bytes.toBytes(typeVal  + “|A”)); //give me all IDs for matching TYPE|VAL 
>>   ResultScanner s1 = tblA.getScanner(tblAScan); 
>> 
>>   for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>> tblBRowResult = s1.next()){ 
>> 
>>           Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue()
>> ),
>> Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric 
>>           ResultScanner s2 = tblA.getScanner(tblAScan); 
>>           List results = s2.next().list();  //only care about column data
>> here, since ID is row key 
>> 
>>           for (KeyValue kv : results){ 
>>                         //do stuff 
>>                         kv.getValue(); 
>>           } 
>> 
>>   } 
>> 
>> } 
>> 
>> 
>> ======================================
>> Modified it with a Get (not updated above).   Thinking the outer loop
>> (get
>> new scan) could be in a different Thread each time, and then combined the
>> results in the end?  
>> 
>> I'm looking for ways to increase performance by adding boxes.  How can I
>> spread the scanner load, so it's not waiting for the next iteration?
>> 
>> 
>> 
>> Jonathan Gray-2 wrote:
>>> Sounds about right.  You seem to have a good grip on things.
>>>
>>> 0.20 will work with millions of columns in a row, but currently there is 
>>> no way to return the massive row in segments.  If the data is big 
>>> enough, you'll have memory allocation issues.  Scanners are still a 
>>> safer way to go until we have intra-row scanning: 
>>> https://issues.apache.org/jira/browse/HBASE-1537
>>>
>>> JG
>>>
>>> llpind wrote:
>>>> Thanks for the tips.
>>>>
>>>> Yeah that is the model we had before, the problem is we can potentially
>>>> have
>>>> millions of IDs for a given TYPE|VAL. 
>>>>
>>>> we are considering something like:
>>>> Row Key: TYPE|VALUE|ID
>>>> column: link:TYPE|VALUE
>>>>
>>>> This is only because ID may never have more than a few TYPE|VAL results
>>>> in
>>>> this current dataset, which would also eliminate the need to go to
>>>> second
>>>> table.  
>>>>
>>>> Thanks for the help.  
>>>>
>>>>
>>>> Jonathan Gray-2 wrote:
>>>>> Well you're trying to do a join.  How much data is actually in TableB?

>>>>> You might consider denormalizing so that you don't have to query
>>>>> TableB, 
>>>>> the data you need is already in TableA.
>>>>>
>>>>> You could use a Get (single trip) for the inner loop rather than a 
>>>>> Scanner (which requires multiple round-trips).  You could even use a
>>>>> Get 
>>>>> for the outer loop by making your table wide instead of tall.
>>>>>
>>>>> Row Key:  TYPE|VALUE
>>>>> Column: link:ID
>>>>>
>>>>> And you have a column for each ID within that TYPE|VALUE row.
>>>>>
>>>>> Also, don't forget to close your scanners if you do use scanners.
>>>>>
>>>>> JG
>>>>>
>>>>>
>>>>> llpind wrote:
>>>>>> Assume a schema like so:  
>>>>>>
>>>>>> TableA======================
>>>>>> Row Key:  TYPE|VALUE|ID
>>>>>> Column:  link:ID  (irrelevant)
>>>>>> TableB======================
>>>>>> Row Key: ID
>>>>>> Column: typeval:TYPE|VALUE
>>>>>> ===========================
>>>>>>
>>>>>>
>>>>>>
>>>>>> I need to iterate over the TableA using a Scanner to get all IDs
>>>>>> based
>>>>>> on
>>>>>> TYPE|VALUE, then for each ID I need to get from TableB what
>>>>>> TYPE|VALUE’s
>>>>>> it’s tied to (a many to many).
>>>>>> Assume I have a list of TYPE|VALUES in a List, and need to process
>>>>>> through
>>>>>> this data.  Done something like this:
>>>>>>
>>>>>>
>>>>>>
>>>>>> for (String typeVal : list){
>>>>>>
>>>>>>   Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
>>>>>> Bytes.toBytes(typeVal  + “|A”));	//give me all IDs for matching
>>>>>> TYPE|VAL
>>>>>>   ResultScanner s1 = tblA.getScanner(tblAScan);
>>>>>>
>>>>>>   for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>>>>>> tblBRowResult = s1.next()){
>>>>>>
>>>>>> 	  Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue()
),
>>>>>> Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric
>>>>>> 	  ResultScanner s2 = tblA.getScanner(tblAScan);
>>>>>> 	  List results = s2.next().list();  //only care about column data
>>>>>> here,
>>>>>> since ID is row key
>>>>>>
>>>>>> 	  for (KeyValue kv : results){
>>>>>> 			//do stuff
>>>>>> 			kv.getValue();
>>>>>> 	  }
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> }
>>>>>>
>>>
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Adding-Removing-regionservers-tp24309642p24395309.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message