hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Hackathon notes 3/21/2011
Date Sat, 26 Mar 2011 15:51:16 GMT
Hi Ted,

I actually ended up scrapping the YCSB approach and built a
system/durability test instead. It's an MR job that writes a particular
pattern of edits, and a second one that verifies them. I'm in the process of
hooking this into our continuous integration system, and will attempt to
open source it somehow or other in the next couple weeks.

-Todd

On Sat, Mar 26, 2011 at 12:58 AM, Ted Dunning <tdunning@maprtech.com> wrote:

>
> Todd,
>
> I see ycsb on your list.
>
> Where did that go?  We have been beating on it as well and have pretty much
> decided that it is worthless as it stands.
>
> My thought is that we need a multi-node version that takes directions about
> what load to generate via ZK.  That is better than a map-reduce based load
> generator because you can ramp load up and down at any time.
>
> Where are you headed with this?
>
>
> On Fri, Mar 25, 2011 at 10:49 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Dear HBase developers,
>>
>> Last Monday, several HBase contributors met up at the StumbleUpon offices
>> for a bit of a hackathon. We spent the beginning of the day discussing a
>> few
>> general topics, and then from about 11am through 7pm or so most of us
>> hunkered down to hacking on various projects. I was the secretary for the
>> morning, so here are the notes. Please excuse any typos or if I got your
>> name wrong - I was never cut out for stenography.
>>
>> Thanks to those who came, and special thanks to the folks at StumbleUpon
>> for
>> space, food, and beer!
>>
>>
>> Agenda:
>>  - Upcoming releases:
>>   - 0.90.2 - when to release? a few bugs
>>   - 0.91.x - - should we do one?
>>   - 0.92.0 - when and what?
>>  - Next user group meetup?
>>  - Upcoming features:
>>   - Rolling restart improvements?
>>   - Online config change
>>   - Security and build issues
>>   - Distributed splitting
>>  - Maybe produce some code today! (power through above, then work on
>> respective priorities)
>>
>> ---
>>
>> People:
>>
>>  - Stack @ StumbleUpon
>>  - Todd @ Cloudera
>>  - Elliot @ NGMoco - using 0.89 in prod, 0.90.1 about to be rolled out
>>  - Ted Yu from CarrierIQ
>>  - Liyin and Nicolas from Facebook, using 0.89 for messaging product
>>  - Benoit from SU - TSDB
>>  - Mingjie, Eugene, Gary from TrendMicro - using some internal build
>> which is like trunk (security + coprocessors frankenbuild)
>>  - JD from SU
>>  - Prakash Khemani from FB - his group is on 0.90 - increment heavy
>> workload
>>   - has a patch for distributed splitting
>>   - if a server goes down, takes 10-15 minutes to catch up, so wants
>> to reduce that time window
>>  - Marc, independent consultant with MetaMarkets right now - 0.90.1
>> "pseudo prdoction" work
>>  - Ryan from StumbleUpon
>>
>>
>> -----
>>
>> 0.90.2:
>>  - next week? (week of 3/28?)
>>   - there are some bugs that need ot be fixed still
>>   - candidate end of this week, then some time for testing
>>  - Stack has volunteered to be release manager
>>
>> 0.91.x:
>>  - should we do it?
>>   - people seem to think yes
>>   - but we shouldn't put much effort into testing these pre-release
>>   - there are a lot of interesting things in trunk that people might
>> want to play with
>>
>> 0.92.x:
>>  - JD would like to have something more than alpha quality in time
>> for Hadoop Summit (3rd or 4th week of June)
>>  - What are pending items?
>>   - Coprocessors
>>   - Online schema changes? Makes Coprocessors more useful
>>   - HBASE-1502 - removing heartbeats
>>   - HBASE-2856 - ACID fixes
>>   - Distributed splitting
>>  - Time based or feature based? we want to try doing really time based
>>  - May 1st for first release candidate
>>
>>
>> Next meetup:
>>  - some time in April? in south bay?
>>
>> Features:
>>  - Rolling restart: Stack working on it
>>  - Online schema edit? FB finds it a pain point but Nicolas not sure
>> where it ranks on their priority list
>>  - Online config changes?
>>  - Online schema change is probably more important than online config
>> change, since config change can be done with rolling restart
>>   - For co-processors, we need to attack some classloading issues
>> before online schema change can really reload coprocessor
>> implementations
>>
>> Security and build:
>>  - Security code has been isolated as much as possible:
>>   - two separate layers:
>>     - RPC layer does secure RPC - pluggable RPC implementation and
>> subclassing for HBaseServer and Client classes
>>     - Loadable coprocessors for auth
>>  - But building is difficult - need to build against a secure Hadoop
>> in order to do this
>>   - conditional build step? maven module?
>>  - Stack and Gary will look into how to build and release this:
>>   - maybe Maven profiles? modules?
>>   - separate jar to be added to classpath with stuff that depends on
>> security
>>
>> Distributed splitting:
>>  - HLogSplitter code is pretty different on FB's 0.90 branch
>>  - But most stuff plugs easily into trunk
>>  - Same interface:
>>   - call splitLog with server name
>>   - master uses SplitLogManager - puts log splitting tasks in ZK
>>   - each RS has SplitLogWorkers - watch for tasks, race to grab them in ZK
>>   - each RS splits logs one at a time
>>   - RS pings the master on the tasks as it splits them
>>   - master can preempt a task away from a worker
>>   - when master comes up it needs to grab orphanned tasks
>>  - some unit tests done, but hasn't been substantially tested on real
>> cluster
>>  - Current splitting does batching - multiple input logs go to one
>> output file per region
>>   - new splitting creates 3-4x as many files for recovered.edits
>>   - this is OK - we already handle this with seqids
>>  - If whole cluster goes down, something like MapReduce makes more sense
>>  - this feature is targeted towards single-RS failure
>>   - currently seeing downtime of 10 minutes when RS goes down
>>   - FB has various internal scripts/tools ("HyperShell") that let
>> them do the full-cluster-failure case efficiently, but they don't have
>> a clean way of open sourcing it
>>   - Maybe we can build something like this with hbase-regionservers.sh
>>
>>
>> What are we working on:
>>  - Todd - maybe making YCSB runnable as integration test
>>  - Stack - rolling restart? with Nicolas's help perhaps
>>  - Marc - add some new cases to hbck
>>  - Ryan - maybe porting RPC to Thrift?
>>   - wants to resolve the meta-in-ZK ticket as "wontfix"
>>  - Prakash - distributed splitting
>>  - JD - fix bugs he saw over the weekend
>>  - Gary - work on splitting out security build (maven pom file fun)
>>  - Eugene: ZK-938 - kerberos stuff for ZooKeeper (necessary for HBase
>> security)
>>   - or maybe just fix some open bugs in HBase
>>  - Mingjie: open bugs for secure HBase (Access Control related)
>>  - Benoit: busy working on StumbleUpon stuff - mostly just observing
>>  - Nicolas: multithreaded compactions - needs to be refactored and cleaned
>> up
>>   - they have very big storefiles (10GB+) so their compactions take 1hr+
>>   - or just talking to people about stuff - easier than IRC
>>  - Liyin - add ability to do ZK miniclusters with multiple ZKs
>>  - Ted - working on pending patches / testing
>>  - Elliot: HBASE-3541 - HBase rest multigets
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message