hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: hbase architecture diagrams
Date Tue, 10 Jun 2008 17:10:19 GMT
That is some of the finest art seen by me in a long time.  We're located 
close to MoMA.  I'm going to see if we can get you an installation.

Answers inline.

Krzysztof Szlapinski wrote:
> hi all,
> to better understand how hbase works i started reading this document 
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture
> and created some diagrams
> here they are (png, and svg for editing):
> 1) habase hierarchy of objects:
> http://www.starline.com.pl/hbase/habase_hierarchy.png
> http://www.starline.com.pl/hbase/habase_hierarchy.svg

I'd suggest that Master, Client and RegionServer be peers rather than 
arranged hierarchically.  The client talks but rarely tot he master only 
to ask it where the catalog tables are located.  Thereafter it talks 
exclusively with the regionserers.  Have arrows going from the cilent to 
both the master and the regionserver.

> 2) hbase architecture (relations between objects)
> http://www.starline.com.pl/hbase/habase_architecture.png
> http://www.starline.com.pl/hbase/habase_architecture.svg
Same as comment above.

> 3) visual representation flush cache operation
> http://www.starline.com.pl/hbase/hbase_flush_cache.png
> http://www.starline.com.pl/hbase/hbase_flush_cache.svg

Here, flushes are done from the memcache.  The diagram doesn't give this 

> since the documentation says that its information may be out of date 
> please feel free to comment on these diagrams, update them, put them 
> on your sites etc
> i got a question too
> lets say we have cluster of 3 machines:
> - 1 master + region server,and
> - 2 region servers
> on each machine I got web server that connects to hbase client to get 
> and get information out from hbase
> it is not clear to me where should these clients connect to
> should all clients connect directly and only to the master, which will 
> tell them on which region server is the information they are looking for?
> or can they connect to the region servers and if the information they 
> are looking for in not in them region servers will contact master and 
> fetch there information for the client?
You almost have it.

A client that wants to insert row X into table A needs to figure which 
region of table A the row X belongs too.  This information is kept in 
the .META. table.  It is a listing of all regions for all tables keyed 
by table and the first row in a region sorted lexicographically.  The 
regions that make up the .META. table table are themselves kept in a 
special catalog table, the -ROOT- table.

A fresh client -- one that has just started and so has an empty cache -- 
goes first to the master to ask it where the root region is hosted.  
Once it has the address of the regionserver hosting the root region, it 
caches it, and then it goes to the hosting regionserver to read the 
location of the .META. table region that has the row that contains the 
region of table A into which X should be inserted.  The client goes to 
the .META. region hosting server after caching its location and reads 
location of the region from table A where it should insert X.

Finally it goes to server hosting table A's region and inserts X.

Over time, cilent builds up a cache of where regions are located and 
will rely on this information rather than travel the net to read 
locations every time it needs to find a region --  until there is a 
fault.  At that time, it will back up the hierarchy of region locations 
to fix its list of locations and then away it goes again.

Check out the Bigtable paper.  It does better explaination than I of how 
this all works.

> krzysiek

View raw message