hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingjie Lai" <mjla...@gmail.com>
Subject Re: Review Request: HBASE-2468: Improvements to prewarm META cache on clients.
Date Wed, 02 Jun 2010 08:11:19 GMT


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HTable.java, line 1075
> > <http://review.hbase.org/r/98/diff/4/?file=872#file872line1075>
> >
> >     I feel like it would be cleaner to have the following methods be non-static,
so they don't need any arguments. that would reduce the number of functions by factor of two

That was what I wanted to do in the very beginning. I don't like so many functions either.
 

But there is one existing method: 
  public static boolean isTableEnabled(String tableName) ...
  public static boolean isTableEnabled(Configuration conf, String tableName) ...

It's one of the reason that I just want to keep the original coding style to be consistent.


In addition, I agree we can make isRegionCachePrefetchEnabled() etc. to be non-static. However,
enableRegionCachePrefetch(), and disableRegionCachePrefetch() must to be static since we want
to enable/disable a table without instantiate HTable. In another word, we cannot really dis/enable
cache prefetch for each HTable instance who have the same table name. While we can only enable/disable
based on table name. It is pretty similar as table enable/disable. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 1608
> > <http://review.hbase.org/r/98/diff/4/?file=871#file871line1608>
> >
> >     kill this function

I don't like it either. I will kill all ``is...Disabled()'' methods. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java, line 3639
> > <http://review.hbase.org/r/98/diff/4/?file=874#file874line3639>
> >
> >     I really like these unit tests, but I think you should use a row key for the
Get that isn't also a region start key, as it may expose different behavior. eg I think you
will end up with 11 cached regions instead of 10

As mentioned above, I reimplemented MetaScanner so it will start scanning Meta from the region
row where the given table row resides, instead of scanning from the next region row in Meta.
HTable.getRowOrBefore() is called in MetaScanner to achieve it, (however I'm not very sure
whether it is the most efficient way to do it or not). 

So for this unit test, no matter the passed row is a region start key or not, we will always
get 10 here. 


> On 2010-05-31 21:52:12, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java, lines 776-781
> > <http://review.hbase.org/r/98/diff/4/?file=871#file871line776>
> >
> >     this block should go after the cache check below, right?

I reimplemented MetaScanner so it will start scanning Meta from the region where the given
user table row resides, instead of scanning from the next region row in Meta.

In this case, prefetchRegionCache() also fetches the queried table+row to the cache. Here
I put it before the cache check block, so it can load the result from cache directly. Otherwise
it may do an extra meta scan if cache prefetch is enabled. 

However if multiple threads accessing this block concurrently, this way will cause cache-prefetch
executed twice. But I think this case is pretty rare, so the penalty should be relatively
smaller. 

What do you think? 


- Mingjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review107
-----------------------------------------------------------


On 2010-05-31 19:49:20, Mingjie Lai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/98/
> -----------------------------------------------------------
> 
> (Updated 2010-05-31 19:49:20)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> HBASE-2468: Improvements to prewarm META cache on clients.
> 
> Changes:
> 1. Add new HTable methods which support region info de/serialation, and region cache
prewarm: 
> - void serializeRegionInfo(): clients could perform a large scan for all the meta for
the table, serialize the meta to a file. MR job can ship a copy of the meta for the table
in the DistributedCache
> - Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize
the region info from the DistributedCache 
> - prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm
local region cache by the deserialized region info.
> 
> 2. For each client, each region cache read-miss could trigger read-ahead some number
of rows in META. This option could be turned on and off for one particular table. 
> 
> 
> This addresses bug HBASE-2468.
>     http://issues.apache.org/jira/browse/HBASE-2468
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d 
>   src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 09de2ac 
>   src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb 
>   src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java 3de661e 
>   src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a 
> 
> Diff: http://review.hbase.org/r/98/diff
> 
> 
> Testing
> -------
> 
> Unit tests passed locally for me. 
> 
> 
> Thanks,
> 
> Mingjie
> 
>


Mime
View raw message