hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: HBASE-2182
Date Sat, 30 Jun 2012 08:27:03 GMT
On Fri, Jun 29, 2012 at 5:04 PM, Todd Lipcon <todd@cloudera.com> wrote:
> A few inline notes below:
> On Fri, Jun 29, 2012 at 4:42 PM, Elliott Clark <eclark@stumbleupon.com>wrote:
>> I just posted a pretty early skeleton(
>> https://issues.apache.org/jira/browse/HBASE-2182) on what I think a netty
>> based hbase client/server could look like.
>> Pros:
>>   - Faster
>>      - Giraph got a 3x perf improvement by droppping hadoop rpc
> Whats the reference for this? The 3x perf I heard about from Giraph was
> from switching to using LMAX's Disruptor instead of queues, internally. We
> could do the same, but I'm not certain the model works well for our use
> cases where the RPC processing can end up blocked on disk access, etc.
>>      - Asynhbase trounces our client when JD benchmarked them
> I'm still convinced that the majority of this has to do with the way our
> batching happens to the server, not async vs sync. (in the current sync
> client, once we fill up the buffer, we "flush" from the same thread, and
> block the flush until all buffered edits have made it, vs doing it in the
> background). We could fix this without going to a fully async model.

I also agree here, if you do the apriori code analysis, it becomes
obvious that the issue is that slower regionservers can hold up entire
batches even if 90%+ of the Puts were already acked...

And don't forget that we used to issue Puts to regionservers SERIALLY
until we do the current parallelism code... (not that the code is
great, but it was relatively easy to fix at the time).

>>   - Could encourage things to be a little more modular if everything isn't
>>   hanging directly off of HRegionServer
> Sure, but not sure I see why this is Netty vs not-Netty
>>   - Netty is better about thread usage than hadoop rpc server.
> Can you explain further?
>>   - Pretty easy to define an rpc protocol after all of the work on
>>   protobuf (Thanks everyone)
>>   - Decoupling the rpc server library from the hadoop library could allow
>>   us to rev the server code easier.
>>   - The filter model is very easy to work with.
>>      - Security can be just a single filter.
>>      - Logging can ba another
>>      - Stats can be another.
>> Cons:
>>   - Netty and non apache rpc server's don't play well togther.  They might
>>   be able to but I haven't gotten there yet.
> What do you mean "non apache rpc servers"?
>>   - Complexity
>>      - Two different servers in the src
>>      - Confusing users who don't know which to pick
>>   - Non-blocking could make the client a harder to write.
>> I'm really just trying to gauge what people think of the direction and if
>> it's still something that is wanted.  The code is a loooooong way from even
>> being a tech demo, and I'm not a netty expert, so suggestions would be
>> welcomed.
>> Thoughts ? Are people interested in this? Should I push this to my github
>> so other can help ?
> IMO, I'd want to see a noticeable perf difference from the change -
> unfortunately it would take a fair amount of work to get to the point where
> you could benchmark it. But if you're willing to spend the time to get to
> that point, seems worth investigating.
> --
> Todd Lipcon
> Software Engineer, Cloudera

View raw message