Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 65443 invoked from network); 4 Jan 2011 13:44:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Jan 2011 13:44:32 -0000 Received: (qmail 20368 invoked by uid 500); 4 Jan 2011 13:44:30 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 20120 invoked by uid 500); 4 Jan 2011 13:44:29 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 20110 invoked by uid 99); 4 Jan 2011 13:44:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jan 2011 13:44:29 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [129.93.165.11] (HELO cse-mail.unl.edu) (129.93.165.11) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jan 2011 13:44:20 +0000 Received: from cse-barracuda.cse.unl.edu (cse-barracuda.unl.edu [129.93.164.185]) (authenticated bits=0) by cse-mail.unl.edu (8.14.3/8.14.3) with ESMTP id p04Dhphg006022 for ; Tue, 4 Jan 2011 07:43:57 -0600 (CST) X-ASG-Debug-ID: 1294148625-4b5c0609270b-YnEwtc Received: from cse.unl.edu (cse.unl.edu [129.93.165.2]) by cse-barracuda.cse.unl.edu with ESMTP id 59hzHYFNQDKo6RJQ for ; Tue, 04 Jan 2011 07:43:45 -0600 (CST) X-Barracuda-Envelope-From: bbockelm@cse.unl.edu X-Barracuda-RBL-Trusted-Forwarder: 129.93.165.2 Received: from pcp088890pcs.unl.edu (pcp088890pcs.unl.edu [129.93.158.5]) (authenticated bits=0) by cse.unl.edu (8.14.4/8.14.3) with ESMTP id p04Dhjuo028165 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Tue, 4 Jan 2011 07:43:45 -0600 From: Brian Bockelman X-Barracuda-Apparent-Source-IP: 129.93.158.5 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/signed; boundary=Apple-Mail-70-161330097; protocol="application/pkcs7-signature"; micalg=sha1 Subject: Re: monit? daemontools? jsvc? something else? Date: Tue, 4 Jan 2011 07:43:46 -0600 X-ASG-Orig-Subj: Re: monit? daemontools? jsvc? something else? In-Reply-To: To: common-user@hadoop.apache.org References: <822641.96119.qm@web130102.mail.mud.yahoo.com> Message-Id: <65889502-F131-4060-A29A-9855B5BA199A@cse.unl.edu> X-Mailer: Apple Mail (2.1082) X-Barracuda-Connect: cse.unl.edu[129.93.165.2] X-Barracuda-Start-Time: 1294148625 X-Barracuda-URL: http://cse-barracuda.unl.edu:8000/cgi-mod/mark.cgi X-Virus-Scanned: clamav-milter 0.96.5 at cse-mail X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.51405 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (cse-mail.unl.edu [129.93.165.11]); Tue, 04 Jan 2011 07:43:57 -0600 (CST) X-Virus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-70-161330097 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii I'll second this opinion. Although there are some tools in life that = need to be actively managed like this (and even then, sometimes = management tools can be set to be too aggressive, making a bad situation = terrible), HDFS is not one. If the JVM dies, you likely need a human brain to log in and figure out = what's wrong - or just keep that node dead. Brian On Jan 3, 2011, at 10:40 PM, Allen Wittenauer wrote: >=20 > On Jan 3, 2011, at 2:22 AM, Otis Gospodnetic wrote: >> I see over on http://search-hadoop.com/?q=3Dmonit+daemontools that = people *do* use=20 >> tools like monit and daemontools (and a few other ones) to keep = revive their=20 >> Hadoop processes when they die. >>=20 >=20 > I'm not a fan of doing this for Hadoop processes, even = TaskTrackers and DataNodes. The processes generally die for a reason, = usually indicating that something is wrong with the box. Restarting = those processes may potentially hide issues. --Apple-Mail-70-161330097 Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIIKDCCA/gw ggLgoAMCAQICASkwDQYJKoZIhvcNAQEFBQAwdTETMBEGCgmSJomT8ixkARkWA25ldDESMBAGCgmS JomT8ixkARkWAkVTMQ4wDAYDVQQKEwVFU25ldDEgMB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9y aXRpZXMxGDAWBgNVBAMTD0VTbmV0IFJvb3QgQ0EgMTAeFw0wMjEyMDUwODAwMDBaFw0xMzAxMjUw ODAwMDBaMGkxEzARBgoJkiaJk/IsZAEZFgNvcmcxGDAWBgoJkiaJk/IsZAEZFghET0VHcmlkczEg MB4GA1UECxMXQ2VydGlmaWNhdGUgQXV0aG9yaXRpZXMxFjAUBgNVBAMTDURPRUdyaWRzIENBIDEw ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC09dYjYaPbCD5mtbiQb7Ka3y1qAm0ZcqKC FciWcfe8Kwcuy9tjHuIsLf9ZItdkDW4xy8sua9nJlx3KlwjtumTMtOtg35KZCknUd8KM4VGTSFdL VG9AbNayef76caVCGM1+jyF0Lq03kauGOPTcNfZe1TZa3e1c9rc8ljV5OSWa/mfsCACyS5zFIWu0 yIDNyJdf+n0hwaPN53wllpJ30taD+JBjQ7h2k4xRWzeaznLOb9OztZVRA/1sVze+iczFh2xwa4Vd Gy0eIIPw1pfvYwxO36rm0S109qvbsNlaroPRbxerPKakQLpKe034Xcx7gBPqUk/FxoRRWin5EWN3 rz9LAgMBAAGjgZ4wgZswDgYDVR0PAQH/BAQDAgGGMBEGCWCGSAGG+EIBAQQEAwIAhzAdBgNVHQ4E FgQUyhkdEo5upDhdQtQxDgjb2Y0XDV0wHwYDVR0jBBgwFoAUvF1NSC/4NZRZq1yJSz7RsjoUAeow DwYDVR0TAQH/BAUwAwEB/zAlBgNVHREEHjAcgRpET0VHcmlkcy1DQS0xQGRvZWdyaWRzLm9yZzAN BgkqhkiG9w0BAQUFAAOCAQEAZNVrIDLqe39CEOiJt7Q7EpBPhAihMvDTSf/42u0SMbUmChww4mLm ph5DBghZUVF8Yn59kRZMn1QLOtO1HzLqvAvPITacZVPlJgG2IXzlR636YghZFAycbIUEOJDBHR4v tQO1KDxgZwvAbtmKIoxvhUCq2xsfFt9kCBBn+JYtQ6O5LsBJq3PmuubeMcc7mbQAfJZ7h/3Qghgk FIhmE1+LBXPJbkuP8vgfg6h2BKoAf5TFfZECgGZKimfN110tBvfedGZwYYd3/GsJc83B0JN1gny0 gqNVPm392UchXGeBRrHnm2gkhIkr48Oq6EmNGV9/a6XfbplQW/JWbtPVPWkaizCCBCgwggMQoAMC AQICAwCvNzANBgkqhkiG9w0BAQUFADBpMRMwEQYKCZImiZPyLGQBGRYDb3JnMRgwFgYKCZImiZPy LGQBGRYIRE9FR3JpZHMxIDAeBgNVBAsTF0NlcnRpZmljYXRlIEF1dGhvcml0aWVzMRYwFAYDVQQD Ew1ET0VHcmlkcyBDQSAxMB4XDTEwMDUyNDE4MTg0OVoXDTExMDUyNDE4MTg0OVowYTETMBEGCgmS JomT8ixkARkWA29yZzEYMBYGCgmSJomT8ixkARkWCGRvZWdyaWRzMQ8wDQYDVQQLEwZQZW9wbGUx HzAdBgNVBAMTFkJyaWFuIEJvY2tlbG1hbiA1MDQzMDcwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAw ggEKAoIBAQCwiNKRMFRMWG+AJyo/G5bYisPndrH+44JqRHdxDzMaQP59gxrBO2koRwg/13gINe3e J8QL6bX4ANz/y1uglsytmoJwK5J9fxNYgJgbja83kO0j0HNL6oIyBhGStluaNTbiHrk8GA95M9VR XRqpYiwKvT1F0KS2r7sZ+PWevbAek787eTqg51yuvUUlIBPgTm1kV3vZs21oeIZUuw7wPGXBKN49 XqIDsamUIqiFARwPgqKR9eo6itlYy2NrHo0hHLXew37rEOcKv/0g4pI4J/y4+1qB7fN3nMkIMack FWfAQTngcnH/JpKmh8fmXdkeVv8EKYUXIgkUI5pb+Ak105olAgMBAAGjgeAwgd0wEQYJYIZIAYb4 QgEBBAQDAgWgMA4GA1UdDwEB/wQEAwIF4DA2BgNVHSAELzAtMA0GCyqGSIb3TAMHAQMBMAwGCiqG SIb3TAUCAgEwDgYMKoZIhvdMBQIDAgEBMD4GA1UdHwQ3MDUwM6AxoC+GLWh0dHA6Ly9jcmwuZG9l Z3JpZHMub3JnLzFjM2YyY2E4LzFjM2YyY2E4LmNybDAfBgNVHREEGDAWgRRiYm9ja2VsbUBjc2Uu dW5sLmVkdTAfBgNVHSMEGDAWgBTKGR0Sjm6kOF1C1DEOCNvZjRcNXTANBgkqhkiG9w0BAQUFAAOC AQEAlQSD+8Cvb0GxWqD4xhXd8Sl5MJRr1uJxMeGoMA4RZAJuyvVlBUx8v5moqY0XHMfNI+FulyMx wgOoNfvF3dluz3J4C/u5NvzfNqikLj++sL4XDaZoxSHLo9cJxVTcM15Gogct+kvIF1+msEsqLlNR /lqUVE/o8ANdD6PVx/044f/Dzi6s+6jZmBz/vWPI77ymT1EHaAkaHDqoNIlItPQrAHdkJWY67v1z s6mDrKaspF/2ThDdYax208o1oLFd8wY8kQUdTBlMmUAbchQnjOC9vH17w6meDc8VxD+pEL3vAiG2 JN2vzQ3IJCYTCTmagUyiLWHFEudH8Brn43NY0/HwKTGCAv0wggL5AgEBMHAwaTETMBEGCgmSJomT 8ixkARkWA29yZzEYMBYGCgmSJomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0 ZSBBdXRob3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQIDAK83MAkGBSsOAwIaBQCgggFi MBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTExMDEwNDEzNDM0Nlow IwYJKoZIhvcNAQkEMRYEFEtCTE8pikYIrUDmGMjfKe2sKQ67MH8GCSsGAQQBgjcQBDFyMHAwaTET MBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmSJomT8ixkARkWCERPRUdyaWRzMSAwHgYDVQQLExdD ZXJ0aWZpY2F0ZSBBdXRob3JpdGllczEWMBQGA1UEAxMNRE9FR3JpZHMgQ0EgMQIDAK83MIGBBgsq hkiG9w0BCRACCzFyoHAwaTETMBEGCgmSJomT8ixkARkWA29yZzEYMBYGCgmSJomT8ixkARkWCERP RUdyaWRzMSAwHgYDVQQLExdDZXJ0aWZpY2F0ZSBBdXRob3JpdGllczEWMBQGA1UEAxMNRE9FR3Jp ZHMgQ0EgMQIDAK83MA0GCSqGSIb3DQEBAQUABIIBAGMMIIV+mmrk6pJuUum5gu2IMWNWjhXaKI89 dvPA/QGxIcSoSlvCj6FEdRj7jd5pFV9GlR8gwhX80cqBQYz97p0vf/mEQ0Xr1qA7ivKH1fSuWcgn m6rv5dWhthb499myaFaVlSofmhm43ryoJjLmTJT4CGTpPTh6h+19I8/wbWORg5tiBEoqxuN9TfaJ aD+zEkSiME0RaWoXMn/8aoKH+2ffmbHO/mFOD9Gk0EGF03SzQwCQDidT99PWxhGjXmbK/fH4LZQ8 BfFBIW9bV993zkxXoWbX7sq4VArDM5AURIxzQbXEkC9FbMfeP5tWOm/R+MnhmWKu9uosdlFth2pk IzMAAAAAAAA= --Apple-Mail-70-161330097--