Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 33094 invoked from network); 26 Mar 2011 22:53:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Mar 2011 22:53:56 -0000 Received: (qmail 74353 invoked by uid 500); 26 Mar 2011 22:53:55 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 74327 invoked by uid 500); 26 Mar 2011 22:53:55 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 74319 invoked by uid 99); 26 Mar 2011 22:53:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Mar 2011 22:53:55 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.169] (HELO mail-vx0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Mar 2011 22:53:50 +0000 Received: by vxk20 with SMTP id 20so1599242vxk.14 for ; Sat, 26 Mar 2011 15:53:29 -0700 (PDT) Received: by 10.52.18.44 with SMTP id t12mr3161508vdd.274.1301180008129; Sat, 26 Mar 2011 15:53:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.157.104 with HTTP; Sat, 26 Mar 2011 15:53:08 -0700 (PDT) X-Originating-IP: [67.160.196.149] In-Reply-To: References: From: Ted Dunning Date: Sat, 26 Mar 2011 15:53:08 -0700 Message-ID: Subject: Re: Hackathon notes 3/21/2011 To: Todd Lipcon Cc: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf30549f592c6517049f6a9415 --20cf30549f592c6517049f6a9415 Content-Type: text/plain; charset=ISO-8859-1 Hmm... Yeah. I hear that "scrapping YCSB" meme a lot. Do you not worry about verifying intermediate results when over-writing data? On Sat, Mar 26, 2011 at 8:51 AM, Todd Lipcon wrote: > Hi Ted, > > I actually ended up scrapping the YCSB approach and built a > system/durability test instead. It's an MR job that writes a particular > pattern of edits, and a second one that verifies them. I'm in the process of > hooking this into our continuous integration system, and will attempt to > open source it somehow or other in the next couple weeks. > > -Todd > > On Sat, Mar 26, 2011 at 12:58 AM, Ted Dunning wrote: > >> >> Todd, >> >> I see ycsb on your list. >> >> Where did that go? We have been beating on it as well and have pretty >> much decided that it is worthless as it stands. >> >> My thought is that we need a multi-node version that takes directions >> about what load to generate via ZK. That is better than a map-reduce based >> load generator because you can ramp load up and down at any time. >> >> Where are you headed with this? >> >> >> On Fri, Mar 25, 2011 at 10:49 PM, Todd Lipcon wrote: >> >>> Dear HBase developers, >>> >>> Last Monday, several HBase contributors met up at the StumbleUpon offices >>> for a bit of a hackathon. We spent the beginning of the day discussing a >>> few >>> general topics, and then from about 11am through 7pm or so most of us >>> hunkered down to hacking on various projects. I was the secretary for the >>> morning, so here are the notes. Please excuse any typos or if I got your >>> name wrong - I was never cut out for stenography. >>> >>> Thanks to those who came, and special thanks to the folks at StumbleUpon >>> for >>> space, food, and beer! >>> >>> >>> Agenda: >>> - Upcoming releases: >>> - 0.90.2 - when to release? a few bugs >>> - 0.91.x - - should we do one? >>> - 0.92.0 - when and what? >>> - Next user group meetup? >>> - Upcoming features: >>> - Rolling restart improvements? >>> - Online config change >>> - Security and build issues >>> - Distributed splitting >>> - Maybe produce some code today! (power through above, then work on >>> respective priorities) >>> >>> --- >>> >>> People: >>> >>> - Stack @ StumbleUpon >>> - Todd @ Cloudera >>> - Elliot @ NGMoco - using 0.89 in prod, 0.90.1 about to be rolled out >>> - Ted Yu from CarrierIQ >>> - Liyin and Nicolas from Facebook, using 0.89 for messaging product >>> - Benoit from SU - TSDB >>> - Mingjie, Eugene, Gary from TrendMicro - using some internal build >>> which is like trunk (security + coprocessors frankenbuild) >>> - JD from SU >>> - Prakash Khemani from FB - his group is on 0.90 - increment heavy >>> workload >>> - has a patch for distributed splitting >>> - if a server goes down, takes 10-15 minutes to catch up, so wants >>> to reduce that time window >>> - Marc, independent consultant with MetaMarkets right now - 0.90.1 >>> "pseudo prdoction" work >>> - Ryan from StumbleUpon >>> >>> >>> ----- >>> >>> 0.90.2: >>> - next week? (week of 3/28?) >>> - there are some bugs that need ot be fixed still >>> - candidate end of this week, then some time for testing >>> - Stack has volunteered to be release manager >>> >>> 0.91.x: >>> - should we do it? >>> - people seem to think yes >>> - but we shouldn't put much effort into testing these pre-release >>> - there are a lot of interesting things in trunk that people might >>> want to play with >>> >>> 0.92.x: >>> - JD would like to have something more than alpha quality in time >>> for Hadoop Summit (3rd or 4th week of June) >>> - What are pending items? >>> - Coprocessors >>> - Online schema changes? Makes Coprocessors more useful >>> - HBASE-1502 - removing heartbeats >>> - HBASE-2856 - ACID fixes >>> - Distributed splitting >>> - Time based or feature based? we want to try doing really time based >>> - May 1st for first release candidate >>> >>> >>> Next meetup: >>> - some time in April? in south bay? >>> >>> Features: >>> - Rolling restart: Stack working on it >>> - Online schema edit? FB finds it a pain point but Nicolas not sure >>> where it ranks on their priority list >>> - Online config changes? >>> - Online schema change is probably more important than online config >>> change, since config change can be done with rolling restart >>> - For co-processors, we need to attack some classloading issues >>> before online schema change can really reload coprocessor >>> implementations >>> >>> Security and build: >>> - Security code has been isolated as much as possible: >>> - two separate layers: >>> - RPC layer does secure RPC - pluggable RPC implementation and >>> subclassing for HBaseServer and Client classes >>> - Loadable coprocessors for auth >>> - But building is difficult - need to build against a secure Hadoop >>> in order to do this >>> - conditional build step? maven module? >>> - Stack and Gary will look into how to build and release this: >>> - maybe Maven profiles? modules? >>> - separate jar to be added to classpath with stuff that depends on >>> security >>> >>> Distributed splitting: >>> - HLogSplitter code is pretty different on FB's 0.90 branch >>> - But most stuff plugs easily into trunk >>> - Same interface: >>> - call splitLog with server name >>> - master uses SplitLogManager - puts log splitting tasks in ZK >>> - each RS has SplitLogWorkers - watch for tasks, race to grab them in >>> ZK >>> - each RS splits logs one at a time >>> - RS pings the master on the tasks as it splits them >>> - master can preempt a task away from a worker >>> - when master comes up it needs to grab orphanned tasks >>> - some unit tests done, but hasn't been substantially tested on real >>> cluster >>> - Current splitting does batching - multiple input logs go to one >>> output file per region >>> - new splitting creates 3-4x as many files for recovered.edits >>> - this is OK - we already handle this with seqids >>> - If whole cluster goes down, something like MapReduce makes more sense >>> - this feature is targeted towards single-RS failure >>> - currently seeing downtime of 10 minutes when RS goes down >>> - FB has various internal scripts/tools ("HyperShell") that let >>> them do the full-cluster-failure case efficiently, but they don't have >>> a clean way of open sourcing it >>> - Maybe we can build something like this with hbase-regionservers.sh >>> >>> >>> What are we working on: >>> - Todd - maybe making YCSB runnable as integration test >>> - Stack - rolling restart? with Nicolas's help perhaps >>> - Marc - add some new cases to hbck >>> - Ryan - maybe porting RPC to Thrift? >>> - wants to resolve the meta-in-ZK ticket as "wontfix" >>> - Prakash - distributed splitting >>> - JD - fix bugs he saw over the weekend >>> - Gary - work on splitting out security build (maven pom file fun) >>> - Eugene: ZK-938 - kerberos stuff for ZooKeeper (necessary for HBase >>> security) >>> - or maybe just fix some open bugs in HBase >>> - Mingjie: open bugs for secure HBase (Access Control related) >>> - Benoit: busy working on StumbleUpon stuff - mostly just observing >>> - Nicolas: multithreaded compactions - needs to be refactored and >>> cleaned >>> up >>> - they have very big storefiles (10GB+) so their compactions take 1hr+ >>> - or just talking to people about stuff - easier than IRC >>> - Liyin - add ability to do ZK miniclusters with multiple ZKs >>> - Ted - working on pending patches / testing >>> - Elliot: HBASE-3541 - HBase rest multigets >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > --20cf30549f592c6517049f6a9415--