hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Cafarella" <michael.cafare...@gmail.com>
Subject Re: HBase Design Ideas, Part I
Date Tue, 16 May 2006 04:22:06 GMT
Hi Stefan,

Thanks for your mail.  Comments below.

On 5/15/06, Stefan Groschupf <sg@media-style.com> wrote:
> I was playing around with row and run in to several problems using
> the hadoop io package. (SequenceReader writer)
> Optimal would be if a cell is a writable but having rowkey and cell
> key and value for each sell blows up disk usage.
> Alternative we can have a row writable so we only one rowkey , n
> column key and n values.
> In case a row has many column this scales very bad. For example my
> row key is a url and my column keys are user ids and the value are
> number of clicks.
> if I want to get the number of clicks for a given url and user, I
> need to load the values for all other user as well. :(
> I already posted a mail about this issue.
> What we may be need is a Writer that can seek first for row key and
> than for column keys.
> In general I agree with  sparse structure.

What about this: don't store explicit "column" fields anywhere.  Rather,
each row is stored as a series of key-value pairs, where the key is the
column name.

True, if there are a huge number of columns and you are interested in
just one, there will be unnecessary processing.  This is especially bad
if one column is a 2-char string and another column is a video file.

So we should actually keep a family of files, segmented by object size.
But in the general case, it shouldn't be possible to "seek to a column".
Instead, you seek to a row and unpack all its key/val (col/cell) pairs.

> My idea was to have the lock on the HRegionServer level, my ideas was
> that the client itself take care about replication,
> means write the value to n servers that have the same replicatins of
> HRegions.

Do you mean that a lock applies to an entire server at once?  Or
that an HRegionServer is responsible for all locks?  (I'd like to do
the latter, at least in the short-term.)

I'd like to avoid having an HRegion that's hosted by multiple servers,
because then it's unclear which HRegionServer should own the lock.
I suppose the HRegionServers for a given HRegion could hold an
election, but this seems like a lot of work.

If there's a row that's really "hot" and wanted by a lot of clients, I could
imagine starting a series of "read-only" HRegionServers that field read
requests.  That way you avoid having an election for the lock but can
still scale capacity if necessary.

(I don't think we'll ever have a situation where a flood of writes come in
the door.  If so, the whole design is a bad idea!)

> > The HBase system can repartition an HTable at any time.  For
> > example, many
> > repeated inserts at a single location may cause a single HRegion to
> > grow
> > very large.  The HBase would then try to split that into multiple
> > HRegions.
> > Those HRegions may be served by the same HRegionServer as the
> > original or may be served by a different one.
> Would the node send out a message to request a split or does the
> master decide based on heart beat messages?

There are two ways that an HRegionServer might offer brand-new
service for an HRegion:
1) The HRegion's old HRegionServer died.  A new HRegionServer
offers the exact same HRegion, loaded from a DFS file.  This will
have to be initiated by the HBaseMaster, because it is the only node that
knows about heartbeats.

2) An HRegion is getting too big, and must be split into two.  I
imagine that this can be initiated by the local HRegionServer,
which then asks the master for various hints (like where there
is another lightly-loaded HRegionServer that could take a new

My idea was to simply download the data to the node and read any time
> locally, but write into the dfs, since in my case write access can be
> slower but I needer very fast read access.

You mean just keep a local cache of the DFS file?  That might be
a good idea for a feature we add into DFS as a performance enhancement.

My idea was in such a case the HRegionServer may be know the new
> location at least until the master is informed.
> So getting a forward message could be faster than get an error and
> try ask for the target again.

The old HRegionServer may not know the new location, depending
on how the new one was created.  (The old one might be dead, too!)
But if we can speed things up substantially by forwarding the location,
I think that's OK.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message