Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 53340 invoked from network); 22 Nov 2010 13:02:53 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Nov 2010 13:02:53 -0000 Received: (qmail 98950 invoked by uid 500); 22 Nov 2010 13:03:24 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 98652 invoked by uid 500); 22 Nov 2010 13:03:23 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 98640 invoked by uid 99); 22 Nov 2010 13:03:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 13:03:23 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.41] (HELO mail-qw0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 13:03:18 +0000 Received: by qwi2 with SMTP id 2so2497606qwi.14 for ; Mon, 22 Nov 2010 05:02:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.240.198 with SMTP id lb6mr3181182qcb.249.1290430977272; Mon, 22 Nov 2010 05:02:57 -0800 (PST) Received: by 10.229.29.18 with HTTP; Mon, 22 Nov 2010 05:02:57 -0800 (PST) In-Reply-To: References: Date: Mon, 22 Nov 2010 15:02:57 +0200 Message-ID: Subject: Re: Hadoop/HBase hardware requirement From: Lior Schachter To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=00163630f62d01ead20495a3e04f --00163630f62d01ead20495a3e04f Content-Type: text/plain; charset=ISO-8859-1 Hi Lars, I agree with every sentence you wrote (and that's why we chose HBase). However, from a managerial point-of-view the question of the initial investment is very important (specially when considering a new technology). Lior p.s. The price is in USD .... On Mon, Nov 22, 2010 at 2:43 PM, Lars George wrote: > Hi Lior, > > I can only hope you state this in Schekel! But 20 nodes with Hadoop > can do quite a lot and you cannot compare a single Oracle box with a > 20 node Hadoop cluster as they serve slightly different use-cases. You > need to make a commitment to what you want to achieve with HBase and > that growth is the most important factor. Scaling Oracle is really > expensive while HBase/Hadoop is not in comparison and costs are > linear, while with Oracle more exponential. > > Lars > > On Mon, Nov 22, 2010 at 1:27 PM, Lior Schachter > wrote: > > Hi all, Thanks for your input and assistance. > > > > > > From your answers I understand that: > > 1. more is better but our configuration might work. > > 2. there are small tweaks we can do that will improve our configuration > > (like having 4x500GB disks). > > 3. use monitoring (like Ganglia) to find the bottlenecks. > > > > For me, The question here is how to balance between our current budget > and > > system stability (and performance). > > I agree that more memory and more disk space will improve our > responsiveness > > but on the other hand our system is NOT expected to be real-time (but > rather > > a back office analytics with few hours delay). > > > > This is a crucial point since the proposed configurations we found in the > > web don't distinguish between real-time configurations and back-office > > configurations. To build a real-time cluster with 20 nodes will cost > around > > 200-300K (in Israel) this is similar to the price of a quite strong > Oracle > > cluster... so my boss (the CTO) was partially right when telling me - but > > you said it would be cheap !! very cheap :) > > > > I believe that more money will come when we show the viability of the > > system... I also read that heterogeneous clusters are common. > > > > It will help a lot if you can provide your configurations and system > > characteristics (maybe in a Wiki page). > > It will also help to get more of the "small tweaks" that you found > helpful. > > > > > > Lior Schachter > > > > > > > > > > > > > > > > On Mon, Nov 22, 2010 at 1:33 PM, Lars George > wrote: > > > >> Oleg, > >> > >> Do you have Ganglia or some other graphing tool running against the > >> cluster? It gives you metrics that are crucial here, for example the > >> load on Hadoop and its DataNodes as well as insertion rates etc. on > >> HBase. What is also interesting is the compaction queue to see if the > >> cluster is going slow. > >> > >> Did you try loading from an empty system to a loaded one? Or was it > >> already filled and you are trying to add more? Are you spreading the > >> load across servers or are you using sequential keys that tax only one > >> server at a time? > >> > >> 16GB should work, but is not ideal. The various daemons simply need > >> room to breathe. But that said, I have personally started with 12GB > >> even and it worked. > >> > >> Lars > >> > >> On Mon, Nov 22, 2010 at 12:17 PM, Oleg Ruchovets > >> wrote: > >> > On Sun, Nov 21, 2010 at 10:39 PM, Krishna Sankar >> >wrote: > >> > > >> >> Oleg & Lior, > >> >> > >> >> Couple of questions & couple of suggestions to ponder: > >> >> A) When you say 20 Name Servers, I assume you are talking about 20 > Task > >> >> Servers > >> >> > >> > > >> > Yes > >> > > >> > > >> >> B) What type are your M/R jobs ? Compute Intensive vs. storage > >> intensive ? > >> >> > >> > > >> > M/R -- most of it -- it is a parsing stuff , result of m/r 5% - 10% > >> stores > >> > to hbase > >> > > >> > > >> >> C) What is your Data growth ? > >> >> > >> > > >> > currently we have 50GB per day , it could be ~150GB. > >> > > >> > > >> >> D) With the current jobs, are you saturating RAM ? CPU ? Or storage > ? > >> >> > >> > Map phase takes 100% CPU consumption since it is a parsing and > input > >> > files are gz. > >> > Definitely have a memory issues. > >> > > >> > > >> >> Ganglia/Hadoop metrics should tell. > >> >> E) Also are your jobs long running or short tasks ? > >> >> > >> > map tasks takes from 5 second to 2 minutes > >> > reducer (insertion to hbase) takes -- ~3 hours > >> > > >> > > >> >> Suggestions: > >> >> A) Your name node could be 32 GB, 2TB Disk. Make sure it is an > >> enterprise > >> >> class server and also backup to an NFS mount. > >> >> B) Also have a decent machine as the checkpoint name node. It could > be > >> >> similar to the task nodes > >> >> B) I assume by Master Machine, you mean Job Tracker. It could be > >> similar > >> >> to the Task Trackers - 16/24 GB memory, with 4-8 TB disk > >> >> C) As Jean-Daniel pointed out 500GB (with more spindles) is what I > >> would > >> >> also recommend. But it also depends on your primary data, > intermediate > >> >> data and final data size. 1 or 2 TB disks are also fine, because they > >> give > >> >> you more strage. I assume you have the default replication of 3 > >> >> D) A 1Gb dedicated network would be good. As there are only ~25 > >> machines, > >> >> you can hang them off of a good Gb switch. Consider 10Gb if there is > too > >> >> much intermediate data traffic, in the future. > >> >> Cheers > >> >> > >> >> > >> >> On 11/21/10 Sun Nov 21, 10, "Oleg Ruchovets" > >> wrote: > >> >> > >> >> >Hi all, > >> >> >After testing HBase for few months with very light configurations > (5 > >> >> >machines, 2 TB disk, 8 GB RAM), we are now planing for production. > >> >> >Our Load - > >> >> >1) 50GB log files to process per day by Map/Reduce jobs. > >> >> >2) Insert 4-5GB to 3 tables in hbase. > >> >> >3) Run 10-20 scans per day (scanning about 20 regions in a table). > >> >> >All this should run in parallel. > >> >> >Our current configuration can't cope with this load and we are > having > >> many > >> >> >stability issues. > >> >> > > >> >> >This is what we have in mind : > >> >> >1. Master machine - 32 GB, 4 TB, Two quad core CPUs. > >> >> >2. Name node - 16 GB, 2TB, Two quad core CPUs. > >> >> >we plan to have up to 20 name servers (starting with 5). > >> >> > > >> >> >We already read > >> >> > > >> >> > >> > http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-ba > >> >> >sic-hardware-recommendations/ > >> >> >. > >> >> > > >> >> >We would appreciate your feedback on our proposed configuration. > >> >> > > >> >> > > >> >> >Regards Oleg & Lior > >> >> > >> >> > >> >> > >> > > >> > > > --00163630f62d01ead20495a3e04f--