hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Advice for smaller clusters in write-heavy environments
Date Thu, 08 May 2008 17:48:37 GMT
Thanks for the helpful note Danny.

Here's a few other things to add to your list.

+ Danny had a map that parsed Text input and then was doing the inserts 
into hbase using TableReduce.  He was using TR probably because we 
suggested he use it but thinking on it, this is probably not the best MR 
setup for filling hbase.  A MR job is going to sort and shuffle the map 
outputs.  This intermediate shuffle/sort step is expensive -- and hbase 
'sorts' on insert anyways.  Danny changed his job so hbase inserts were 
done in the map task.  The map made no emissions and his job had no reduce.
+ On the loading TaskTracker to RegionServers imbalance on job start, 
one tactic we could have tried was run a single TT at job start, then 
after split, add the second one (mid-job).
+ Danny tried hbase and ran into problems.   Some of his issues were 
hbase bugs.  Others were matters of network setup and hardware sizing.   
Rather than give up, he stuck with it and together we figured them out.

St.Ack



Mime
View raw message