hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: [DISCUSSION] Accumulo, another BigTable clone, has shown up on Apache Incubator as a proposal
Date Tue, 06 Sep 2011 21:51:20 GMT

There was a discussion over the weekend on the incubator dist-list about
Accumulo and describing what was borrowed from Hadoop-core and Hbase.


5400 lines: slightly modified versions of Hadoop BCFile and related classes
            (our current file format extends BCFile)
4300 lines: heavily modified versions of MapFile and SequenceFile
            (no longer our default file format, but still included for
backward compatibility)
2000 lines: heavily modified versions of HBase BlockCache and related files
            (Adam didn't count the tests when he said 1500 lines)
1300 lines: heavily modified versions of Hadoop BloomFilters
419 lines: modified Hadoop TeraSortIngest to sort data using Accumulo
325 lines: our Value is an immutable version of Hadoop BytesWritable
142 lines: modified ClassLoader based on commons-jci ReloadingClassLoader

On 9/5/11 5:35 PM, "Joey Echeverria" <joey@cloudera.com> wrote:

>The Accumulo implementation of the WAL is a separate set of daemons.
>When you write to the WAL, you send your transactions to three of the
>logging servers. When you do a recovery, I believe one of the three
>servers that has the WAL for the down server copies it to HDFS and
>then a MapReduce job splits the log and re-inserts the recovered data.
>You should have the same survivability that you get with HDFS.
>On Mon, Sep 5, 2011 at 5:06 PM, Bill <bill@dehora.net> wrote:
>> On 04/09/11 07:43, Mathias Herberts wrote:
>>> On Sep 4, 2011 1:39 AM, "Bill de hÓra"<lists@dehora.net>  wrote:
>>>> On 02/09/11 19:06, Stack wrote:
>>>>> What do folks think?
>>>> Not putting the log into hdfs seems like a good idea.
>>> I was somehow thinking the opposite as it makes irrecoverable machine
>>> failures much more problematic. What makes you say it's a good idea?
>> Allows more control over the write path, specifically sequential I/O and
>> crash recovery. Granted the commit needs to be replicated, but you need
>> regardless. Thinking a bit more it might not square with the
>> model anyway, plus the Accumulo proposal mentions a service rather than
>> local disk. The WAL seems to be hardened up these days anyway making
>> like https://issues.apache.org/jira/browse/HBASE-4107 more of an edge
>> Bill
>Joseph Echeverria
>Cloudera, Inc.

View raw message