hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: HBase Design Ideas, Part I
Date Tue, 30 May 2006 21:44:31 GMT
sorry for the delay in responding...

>> I already posted a mail about this issue.
>> What we may be need is a Writer that can seek first for row key and
>> than for column keys.
>> In general I agree with  sparse structure.
> What about this: don't store explicit "column" fields anywhere.   
> Rather,
> each row is stored as a series of key-value pairs, where the key is  
> the
> column name.
I didn't got this.
How you want to associate than one key - value pair ( let's name it  
cell) to a row key?
As mentioned I see a object "rowKey - columnName - value" or one  
rowKey - columnKey-Value[]
> True, if there are a huge number of columns and you are interested in
> just one, there will be unnecessary processing.  This is especially  
> bad
> if one column is a 2-char string and another column is a video file.
> So we should actually keep a family of files, segmented by object  
> size.
> But in the general case, it shouldn't be possible to "seek to a  
> column".
> Instead, you seek to a row and unpack all its key/val (col/cell)  
> pairs.
Hmm, I'm not sure if I like the idea of having size based separated  
files of columns.
I don't think there are many use cases where people will store lets  
say locales and video files associated to the same url row key.
In such a case it makes more sense to have separated tables.
 From my point of view the best way would be to have a kind of column  
seek mechanism, what will require a other kind of sequence writer and  
As far I remember the google system has all columns of a row in one  
What you think about to beeing able have one row in different tablets  
but each tablet has different rows?
So not just distribute the rows but also columns.

>> My idea was to have the lock on the HRegionServer level, my ideas was
>> that the client itself take care about replication,
>> means write the value to n servers that have the same replicatins of
>> HRegions.
> Do you mean that a lock applies to an entire server at once?  Or
> that an HRegionServer is responsible for all locks?  (I'd like to do
> the latter, at least in the short-term.)
Yes, the later is better from my point of view.
> I'd like to avoid having an HRegion that's hosted by multiple servers,
> because then it's unclear which HRegionServer should own the lock.
> I suppose the HRegionServers for a given HRegion could hold an
> election, but this seems like a lot of work.
> If there's a row that's really "hot" and wanted by a lot of  
> clients, I could
> imagine starting a series of "read-only" HRegionServers that field  
> read
> requests.  That way you avoid having an election for the lock but can
> still scale capacity if necessary.
That is a good idea.
>> > The HBase system can repartition an HTable at any time.  For
>> > example, many
>> > repeated inserts at a single location may cause a single HRegion to
>> > grow
>> > very large.  The HBase would then try to split that into multiple
>> > HRegions.
>> > Those HRegions may be served by the same HRegionServer as the
>> > original or may be served by a different one.
>> Would the node send out a message to request a split or does the
>> master decide based on heart beat messages?
> There are two ways that an HRegionServer might offer brand-new
> service for an HRegion:
> 1) The HRegion's old HRegionServer died.  A new HRegionServer
> offers the exact same HRegion, loaded from a DFS file.  This will
> have to be initiated by the HBaseMaster, because it is the only  
> node that
> knows about heartbeats.
Make sense.
> 2) An HRegion is getting too big, and must be split into two.  I
> imagine that this can be initiated by the local HRegionServer,
> which then asks the master for various hints (like where there
> is another lightly-loaded HRegionServer that could take a new
> Region).
May be the local Region Server just request to be spitted and the  
master handle the split itself.
My concern is that just using heart beats to announce regions to the  
master is not fast enough.
Means when region is splitted all rows need to be read only during  
the process. The master need to know the two new regions before we  
remove the write lock.

> My idea was to simply download the data to the node and read any time
>> locally, but write into the dfs, since in my case write access can be
>> slower but I needer very fast read access.
> You mean just keep a local cache of the DFS file?  That might be
> a good idea for a feature we add into DFS as a performance  
> enhancement.

Yes, reading files from DFS is too slow,
we ran into the same performance problem to often in the several  

For example reading a lucene index file - as nutch does - from dfs is  
just useless. But loading a copy to the local hdd is fast enough  
during startup.
In general I don't think disk space is an issue these days, so I have  
no problem to have data replicated in the dfs and on a local hdd.

View raw message