Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTikwwetAQdAcHRaU+=NogmAq--5k+6mY25_vc+TU@mail.gmail.com>
References: <AANLkTi=Xqhm8U3FWHQpHUaMmCz-SHoLvcUOminUca=qZ@mail.gmail.com>
 <AANLkTimJR1fvG-Pvvc98TyJfckmTN-ib0bTPANuyGc_3@mail.gmail.com>
 <AANLkTikwwetAQdAcHRaU+=NogmAq--5k+6mY25_vc+TU@mail.gmail.com>
From: Ted Dunning <tdunning@maprtech.com>
Date: Sat, 26 Mar 2011 15:53:08 -0700
Message-ID: <AANLkTin3-ZQj1196D8489mF8JCzvFAQD6T9dWYm51OcY@mail.gmail.com>
Subject: Re: Hackathon notes 3/21/2011
To: Todd Lipcon <todd@cloudera.com>
Cc: dev@hbase.apache.org
Content-Type: multipart/alternative; boundary=20cf30549f592c6517049f6a9415

--20cf30549f592c6517049f6a9415
Content-Type: text/plain; charset=ISO-8859-1

Hmm...

Yeah.  I hear that "scrapping YCSB" meme a lot.

Do you not worry about verifying intermediate results when over-writing
data?

On Sat, Mar 26, 2011 at 8:51 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Ted,
>
> I actually ended up scrapping the YCSB approach and built a
> system/durability test instead. It's an MR job that writes a particular
> pattern of edits, and a second one that verifies them. I'm in the process of
> hooking this into our continuous integration system, and will attempt to
> open source it somehow or other in the next couple weeks.
>
> -Todd
>
> On Sat, Mar 26, 2011 at 12:58 AM, Ted Dunning <tdunning@maprtech.com>wrote:
>
>>
>> Todd,
>>
>> I see ycsb on your list.
>>
>> Where did that go?  We have been beating on it as well and have pretty
>> much decided that it is worthless as it stands.
>>
>> My thought is that we need a multi-node version that takes directions
>> about what load to generate via ZK.  That is better than a map-reduce based
>> load generator because you can ramp load up and down at any time.
>>
>> Where are you headed with this?
>>
>>
>> On Fri, Mar 25, 2011 at 10:49 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Dear HBase developers,
>>>
>>> Last Monday, several HBase contributors met up at the StumbleUpon offices
>>> for a bit of a hackathon. We spent the beginning of the day discussing a
>>> few
>>> general topics, and then from about 11am through 7pm or so most of us
>>> hunkered down to hacking on various projects. I was the secretary for the
>>> morning, so here are the notes. Please excuse any typos or if I got your
>>> name wrong - I was never cut out for stenography.
>>>
>>> Thanks to those who came, and special thanks to the folks at StumbleUpon
>>> for
>>> space, food, and beer!
>>>
>>>
>>> Agenda:
>>>  - Upcoming releases:
>>>   - 0.90.2 - when to release? a few bugs
>>>   - 0.91.x - - should we do one?
>>>   - 0.92.0 - when and what?
>>>  - Next user group meetup?
>>>  - Upcoming features:
>>>   - Rolling restart improvements?
>>>   - Online config change
>>>   - Security and build issues
>>>   - Distributed splitting
>>>  - Maybe produce some code today! (power through above, then work on
>>> respective priorities)
>>>
>>> ---
>>>
>>> People:
>>>
>>>  - Stack @ StumbleUpon
>>>  - Todd @ Cloudera
>>>  - Elliot @ NGMoco - using 0.89 in prod, 0.90.1 about to be rolled out
>>>  - Ted Yu from CarrierIQ
>>>  - Liyin and Nicolas from Facebook, using 0.89 for messaging product
>>>  - Benoit from SU - TSDB
>>>  - Mingjie, Eugene, Gary from TrendMicro - using some internal build
>>> which is like trunk (security + coprocessors frankenbuild)
>>>  - JD from SU
>>>  - Prakash Khemani from FB - his group is on 0.90 - increment heavy
>>> workload
>>>   - has a patch for distributed splitting
>>>   - if a server goes down, takes 10-15 minutes to catch up, so wants
>>> to reduce that time window
>>>  - Marc, independent consultant with MetaMarkets right now - 0.90.1
>>> "pseudo prdoction" work
>>>  - Ryan from StumbleUpon
>>>
>>>
>>> -----
>>>
>>> 0.90.2:
>>>  - next week? (week of 3/28?)
>>>   - there are some bugs that need ot be fixed still
>>>   - candidate end of this week, then some time for testing
>>>  - Stack has volunteered to be release manager
>>>
>>> 0.91.x:
>>>  - should we do it?
>>>   - people seem to think yes
>>>   - but we shouldn't put much effort into testing these pre-release
>>>   - there are a lot of interesting things in trunk that people might
>>> want to play with
>>>
>>> 0.92.x:
>>>  - JD would like to have something more than alpha quality in time
>>> for Hadoop Summit (3rd or 4th week of June)
>>>  - What are pending items?
>>>   - Coprocessors
>>>   - Online schema changes? Makes Coprocessors more useful
>>>   - HBASE-1502 - removing heartbeats
>>>   - HBASE-2856 - ACID fixes
>>>   - Distributed splitting
>>>  - Time based or feature based? we want to try doing really time based
>>>  - May 1st for first release candidate
>>>
>>>
>>> Next meetup:
>>>  - some time in April? in south bay?
>>>
>>> Features:
>>>  - Rolling restart: Stack working on it
>>>  - Online schema edit? FB finds it a pain point but Nicolas not sure
>>> where it ranks on their priority list
>>>  - Online config changes?
>>>  - Online schema change is probably more important than online config
>>> change, since config change can be done with rolling restart
>>>   - For co-processors, we need to attack some classloading issues
>>> before online schema change can really reload coprocessor
>>> implementations
>>>
>>> Security and build:
>>>  - Security code has been isolated as much as possible:
>>>   - two separate layers:
>>>     - RPC layer does secure RPC - pluggable RPC implementation and
>>> subclassing for HBaseServer and Client classes
>>>     - Loadable coprocessors for auth
>>>  - But building is difficult - need to build against a secure Hadoop
>>> in order to do this
>>>   - conditional build step? maven module?
>>>  - Stack and Gary will look into how to build and release this:
>>>   - maybe Maven profiles? modules?
>>>   - separate jar to be added to classpath with stuff that depends on
>>> security
>>>
>>> Distributed splitting:
>>>  - HLogSplitter code is pretty different on FB's 0.90 branch
>>>  - But most stuff plugs easily into trunk
>>>  - Same interface:
>>>   - call splitLog with server name
>>>   - master uses SplitLogManager - puts log splitting tasks in ZK
>>>   - each RS has SplitLogWorkers - watch for tasks, race to grab them in
>>> ZK
>>>   - each RS splits logs one at a time
>>>   - RS pings the master on the tasks as it splits them
>>>   - master can preempt a task away from a worker
>>>   - when master comes up it needs to grab orphanned tasks
>>>  - some unit tests done, but hasn't been substantially tested on real
>>> cluster
>>>  - Current splitting does batching - multiple input logs go to one
>>> output file per region
>>>   - new splitting creates 3-4x as many files for recovered.edits
>>>   - this is OK - we already handle this with seqids
>>>  - If whole cluster goes down, something like MapReduce makes more sense
>>>  - this feature is targeted towards single-RS failure
>>>   - currently seeing downtime of 10 minutes when RS goes down
>>>   - FB has various internal scripts/tools ("HyperShell") that let
>>> them do the full-cluster-failure case efficiently, but they don't have
>>> a clean way of open sourcing it
>>>   - Maybe we can build something like this with hbase-regionservers.sh
>>>
>>>
>>> What are we working on:
>>>  - Todd - maybe making YCSB runnable as integration test
>>>  - Stack - rolling restart? with Nicolas's help perhaps
>>>  - Marc - add some new cases to hbck
>>>  - Ryan - maybe porting RPC to Thrift?
>>>   - wants to resolve the meta-in-ZK ticket as "wontfix"
>>>  - Prakash - distributed splitting
>>>  - JD - fix bugs he saw over the weekend
>>>  - Gary - work on splitting out security build (maven pom file fun)
>>>  - Eugene: ZK-938 - kerberos stuff for ZooKeeper (necessary for HBase
>>> security)
>>>   - or maybe just fix some open bugs in HBase
>>>  - Mingjie: open bugs for secure HBase (Access Control related)
>>>  - Benoit: busy working on StumbleUpon stuff - mostly just observing
>>>  - Nicolas: multithreaded compactions - needs to be refactored and
>>> cleaned
>>> up
>>>   - they have very big storefiles (10GB+) so their compactions take 1hr+
>>>   - or just talking to people about stuff - easier than IRC
>>>  - Liyin - add ability to do ZK miniclusters with multiple ZKs
>>>  - Ted - working on pending patches / testing
>>>  - Elliot: HBASE-3541 - HBase rest multigets
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

--20cf30549f592c6517049f6a9415--