Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 89177 invoked from network); 20 Mar 2007 16:01:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Mar 2007 16:01:55 -0000 Received: (qmail 94671 invoked by uid 500); 20 Mar 2007 16:02:02 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 94641 invoked by uid 500); 20 Mar 2007 16:02:02 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 94632 invoked by uid 99); 20 Mar 2007 16:02:02 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2007 09:02:02 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [84.96.21.10] (HELO trinity.anyware-tech.com) (84.96.21.10) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2007 09:01:52 -0700 Received: from localhost (localhost [127.0.0.1]) by trinity.anyware-tech.com (Postfix) with ESMTP id 967FC48DEB7 for ; Tue, 20 Mar 2007 17:01:29 +0100 (CET) Received: from trinity.anyware-tech.com ([127.0.0.1]) by localhost (trinity.anyware-tech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 08606-06 for ; Tue, 20 Mar 2007 17:01:23 +0100 (CET) Received: from [10.0.3.27] (poukram.anyware [10.0.3.27]) by trinity.anyware-tech.com (Postfix) with ESMTP id 90D4448DC3F for ; Tue, 20 Mar 2007 17:01:23 +0100 (CET) Message-ID: <4600055D.9060603@apache.org> Date: Tue, 20 Mar 2007 17:01:33 +0100 From: Sylvain Wallez User-Agent: Thunderbird 1.5.0.10 (Macintosh/20070221) MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: Running tasks in the TaskTracker VM References: <45FEA25F.4080208@anyware-tech.com> <45FEC062.2020409@apache.org> <45FECDB9.1090300@anyware-tech.com> <39091B81-BF93-4C04-A27F-0E729184DA09@yahoo-inc.com> <599B758F-2C32-46E2-888D-C7F2D72B73EC@apache.org> <45FFB534.2000807@gmail.com> <653C22B7-9193-4902-8B4A-017788CFC1FF@apache.org> <45FFE8A3.5070008@gmail.com> In-Reply-To: <45FFE8A3.5070008@gmail.com> X-Enigmail-Version: 0.94.0.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Scanned: Debian amavisd-new at anyware-tech.com X-Virus-Checked: Checked by ClamAV on apache.org Stephane Bailliez wrote: > Torsten Curdt wrote: >> >>> Being a complete idiot for distributed computing, I would say it is >>> easy to explode a JVM when doing such distributed jobs, (should it >>> be for OOM or anything). >> >> Then restrict what people can do - at least Google went that route. > > I don't know what Google did on the specifics :) They came up with their own language for mapreduce jobs: http://labs.google.com/papers/sawzall.html > If you want to do that with Java and restrict memory usage, cpu usage > and descriptor access within each inVM instance. That's a considerable > amount of work that likely implies writing a specific agent for the vm > (or an agent for a specific vm that is, because it's pretty unlikely > that you will get the same results across vms), assuming that can then > really be done at the classloader level for each task (which is pretty > insanely complex to me if you have to consider allocation done at the > parent classloader level, etc..) > > At least by forking a vm you can afford to get some reasonably bound > control over the resources usage (or at least memory) without bringing > down everything since a vm is already bound to some degrees. > > >>> Failing jobs are not exactly uncommon and running things in a >>> sandboxed environment with less risk for the tracker seems like a >>> perfectly reasonable choice. So yeah, vm pooling certainly makes >>> perfect sense for it >> >> I am still not convinced - sorry >> >> It's a bit like you would like to run JSPs in a separate JVM because >> they might take down the servlet container. > > it is a bit too extreme in granularity. I think it is more about like > running n different webapps within the same VM or not. So if one > webapp is resource hog, separating it would not harm the n-1 other > applications and you would either create another server instance or > move it away to another node. > > I know of environment with large number of nodes (not related to > hadoop) where they also reboot a set of nodes daily to ensure that all > machines are really in working conditions (it's usually when the > machine reboots due to failure or whatever that someone has to rush to > it because some service forgot to be registered or things like that, > so doing this periodic check gives some people better ideas of their > response time to failure). That depends of operational procedures for > sure. This can be another implementation of the TaskTracker: a single JVM that forks a "replacement JVM" after either a given time or a given amount of tasks executed. This can avoid JVM fork overhead while also avoiding memory leak problems. The forked JVM could even be pre-forked and monitor the active one, taking over if it no more responds (and eventually killing it). Sylvain -- Sylvain Wallez - http://bluxte.net