hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Avro for WAL serialization format?
Date Wed, 16 Dec 2009 06:44:54 GMT
I call that wild idea and raise you.

I had the random idea once to log an audit trail to the WAL in addition
to edits (all of the query side stuff, plus exceptional conditions and
important metrics) and then hand off the rolled WALs to some periodic
MapReduce process for reduction into long term storage, perhaps with
correlation. Like Chukwa, sort of. Half an audit trail -- the write side,
the mutations -- is already there in the WAL in chronological order, and
this may not be an unreasonable way to handle audit trails from 100+ or
dare I say it 1000+ region servers while trying to stick within the the
Hadoop stack, not pick up the complexities of some other external
component such as Scribe or some syslog collector, etc). Just stack up
the WALs in HDFS and process them at the end of the day or something like
that. 

Anyway the bloom soon comes off the rose... I mean idea...

Trouble of course is doubling or tripling (or more) the size of the WAL
with follow on negative write path performance impacts: more frequent
rolling, more data to sync, need to append data to files in HDFS if only
serving queries, etc. 

However if Avro is both fast and has good support for nested structures
with optional fields, and we could come up with some scheme where some
marker indicates a field should get the last previous value seen (as
opposed to just being null), then it might not be so crazy. 

   - Andy





________________________________
From: stack <stack@duboce.net>
To: hbase-dev@hadoop.apache.org
Sent: Tue, December 15, 2009 8:32:56 PM
Subject: Re: Avro for WAL serialization format?

What do you see as advantage Jeff?  I suppose it'd be more compact that
current Writable-based serialization.   Current HBase WAL is a SequenceFile.
We'd have to move away from that?
Thanks,
St.Ack


On Tue, Dec 15, 2009 at 7:46 PM, Jeff Hammerbacher <hammer@cloudera.com>wrote:

> Hey,
>
> Inspired by Drizzle's use of Protobufs for their transaction log format
> (e.g.
>
> http://jpipes.com/index.php?/archives/299-Drizzle-Replication-The-Transaction-Log.html
> ):
> how crazy would it be to try out Avro's binary format for the HBase WAL?
>
> Thanks,
> Jeff
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message