hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jl...@streamy.com>
Subject Re: Adding/Removing regionservers
Date Tue, 07 Jul 2009 23:30:24 GMT
Yes, you could multi-thread your scanners.

You could query for region information to get the start/stop rows for 
the regions in the table, and then spin up a scanner in each thread for 
each region.

If you plan on doing anything like that, keep me / the list in the loop, 
  would be willing to help out.

JG

llpind wrote:
> Thanks for the link, that sounds good.
> 
> If I multi-thread scanners will HBase performance speed up as more boxes are
> added?
> 
> for example in the above example I had:
> 
> for (String typeVal : list){ 
> 
>   Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
> Bytes.toBytes(typeVal  + “|A”)); //give me all IDs for matching TYPE|VAL 
>   ResultScanner s1 = tblA.getScanner(tblAScan); 
> 
>   for (Result tblBRowResult = s1.next(); tblBRowResult != null;
> tblBRowResult = s1.next()){ 
> 
>           Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
> Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric 
>           ResultScanner s2 = tblA.getScanner(tblAScan); 
>           List results = s2.next().list();  //only care about column data
> here, since ID is row key 
> 
>           for (KeyValue kv : results){ 
>                         //do stuff 
>                         kv.getValue(); 
>           } 
> 
>   } 
> 
> } 
> 
> 
> ======================================
> Modified it with a Get (not updated above).   Thinking the outer loop (get
> new scan) could be in a different Thread each time, and then combined the
> results in the end?  
> 
> I'm looking for ways to increase performance by adding boxes.  How can I
> spread the scanner load, so it's not waiting for the next iteration?
> 
> 
> 
> Jonathan Gray-2 wrote:
>> Sounds about right.  You seem to have a good grip on things.
>>
>> 0.20 will work with millions of columns in a row, but currently there is 
>> no way to return the massive row in segments.  If the data is big 
>> enough, you'll have memory allocation issues.  Scanners are still a 
>> safer way to go until we have intra-row scanning: 
>> https://issues.apache.org/jira/browse/HBASE-1537
>>
>> JG
>>
>> llpind wrote:
>>> Thanks for the tips.
>>>
>>> Yeah that is the model we had before, the problem is we can potentially
>>> have
>>> millions of IDs for a given TYPE|VAL. 
>>>
>>> we are considering something like:
>>> Row Key: TYPE|VALUE|ID
>>> column: link:TYPE|VALUE
>>>
>>> This is only because ID may never have more than a few TYPE|VAL results
>>> in
>>> this current dataset, which would also eliminate the need to go to second
>>> table.  
>>>
>>> Thanks for the help.  
>>>
>>>
>>> Jonathan Gray-2 wrote:
>>>> Well you're trying to do a join.  How much data is actually in TableB? 
>>>> You might consider denormalizing so that you don't have to query TableB,

>>>> the data you need is already in TableA.
>>>>
>>>> You could use a Get (single trip) for the inner loop rather than a 
>>>> Scanner (which requires multiple round-trips).  You could even use a Get

>>>> for the outer loop by making your table wide instead of tall.
>>>>
>>>> Row Key:  TYPE|VALUE
>>>> Column: link:ID
>>>>
>>>> And you have a column for each ID within that TYPE|VALUE row.
>>>>
>>>> Also, don't forget to close your scanners if you do use scanners.
>>>>
>>>> JG
>>>>
>>>>
>>>> llpind wrote:
>>>>> Assume a schema like so:  
>>>>>
>>>>> TableA======================
>>>>> Row Key:  TYPE|VALUE|ID
>>>>> Column:  link:ID  (irrelevant)
>>>>> TableB======================
>>>>> Row Key: ID
>>>>> Column: typeval:TYPE|VALUE
>>>>> ===========================
>>>>>
>>>>>
>>>>>
>>>>> I need to iterate over the TableA using a Scanner to get all IDs based
>>>>> on
>>>>> TYPE|VALUE, then for each ID I need to get from TableB what
>>>>> TYPE|VALUE’s
>>>>> it’s tied to (a many to many).
>>>>> Assume I have a list of TYPE|VALUES in a List, and need to process
>>>>> through
>>>>> this data.  Done something like this:
>>>>>
>>>>>
>>>>>
>>>>> for (String typeVal : list){
>>>>>
>>>>>   Scan tblAScan = new Scan(Bytes.toBytes(typeVal  + “|”),
>>>>> Bytes.toBytes(typeVal  + “|A”));	//give me all IDs for matching
>>>>> TYPE|VAL
>>>>>   ResultScanner s1 = tblA.getScanner(tblAScan);
>>>>>
>>>>>   for (Result tblBRowResult = s1.next(); tblBRowResult != null;
>>>>> tblBRowResult = s1.next()){
>>>>>
>>>>> 	  Scan tblBScan = new Scan(Bytes.toBytes(tblBRowResult.getValue() ),
>>>>> Bytes.toBytes(typeVal  + “ ”));  //IDs are all numeric
>>>>> 	  ResultScanner s2 = tblA.getScanner(tblAScan);
>>>>> 	  List results = s2.next().list();  //only care about column data
>>>>> here,
>>>>> since ID is row key
>>>>>
>>>>> 	  for (KeyValue kv : results){
>>>>> 			//do stuff
>>>>> 			kv.getValue();
>>>>> 	  }
>>>>>
>>>>>   }
>>>>>
>>>>> }
>>>>>
>>
> 

Mime
View raw message