Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8713A6671 for ; Wed, 18 May 2011 23:54:38 +0000 (UTC) Received: (qmail 43031 invoked by uid 500); 18 May 2011 23:54:37 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 42969 invoked by uid 500); 18 May 2011 23:54:37 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 42961 invoked by uid 99); 18 May 2011 23:54:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 23:54:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.176] (HELO mail-gx0-f176.google.com) (209.85.161.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 23:54:28 +0000 Received: by gxk7 with SMTP id 7so1066788gxk.35 for ; Wed, 18 May 2011 16:54:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.43.15 with SMTP id v15mr1945867ybj.170.1305762846957; Wed, 18 May 2011 16:54:06 -0700 (PDT) Received: by 10.151.45.4 with HTTP; Wed, 18 May 2011 16:54:06 -0700 (PDT) X-Originating-IP: [64.105.168.204] In-Reply-To: References: <20110518103628.GV970@burko.lfod.us> Date: Wed, 18 May 2011 16:54:06 -0700 Message-ID: Subject: Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause? From: Aaron Eng To: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00151750da44a7631704a3959aef X-Virus-Checked: Checked by ClamAV on apache.org --00151750da44a7631704a3959aef Content-Type: text/plain; charset=ISO-8859-1 Hey Tim, Hope everything is good with you. Looks like you're having some fun with hadoop. >Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? It's not a good idea, its just how it defaults. You'll find hundreds or probably thousands of these quirks as you work with Apache/Cloudera hadoop distributions. Never trust the defaults. > submitted a JIRA That's the way to do it. >which appears to have been resolved ... but it does feel somewhat dissatisfying, since by the time you see the WARNING your cluster is already useless/dead. And that's why, if it's relevant to you, you're best bet is to resolve the JIRA yourself. Most of the contributors are big picture types who would look at "small" usability issues like this and scoff about "newbies". Of course, by the time you're familiar enough with Hadoop and comfortable enough to fix your own JIRA's, you might also join the ranks of jaded contributor who scoffs ad usability issues logged by newbies. Case in point, I noted a while ago that when you run the namenode -format command, it only accepts a capital Y (or lower case, can't remember), and it fails silently if you give the wrong case. I didn't particularly care enough to fix it, having already learned my lesson. You'll find lots of these rough edges through hadoop, it is not a user firendly, out-of-the-box enterprise-ready product. On Wed, May 18, 2011 at 4:41 PM, Time Less wrote: > Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? I'd > rather, in order of preference, have the following behaviours if dfs.*.dir > are undefined: > > 1. Daemons log errors and fail to start at all, > 2. Daemons start but default to /var/db/hadoop (or any persistent > location), meanwhile logging in huge screaming all-caps letters that it's > picked a default which may not be optimal, > 3. Daemons start botnet and DDOS random government websites, wait 36 > hours, then phone the FBI and blame administrator for it*, > 4. Daemons write "persistent" data into /tmp without any great fanfare, > allowing a sense of complacency in its victims, only to report at a random > time in the future that everything is corrupted beyond repair, ie current > behaviour. > > I submitted a JIRA (which appears to have been resolved, yay!) to at least > add verbiage to the WARNING letting you know why you've irreversibly > corrupted your cluster, but it does feel somewhat dissatisfying, since by > the time you see the WARNING your cluster is already useless/dead. > > It's not quite what you're asking for, but your NameNode's web interface >> should >> provide a merged dump of all the relevant config settings, including >> comments >> indicating the name of the config file where the setting was defined, at >> the >> /conf path. >> > > Cool, though it looks like that's just the NameNode's config, right? Not > the DataNode's config, which is the component corrupting data due to this > default? > > -- > Tim Ellis > Riot Games > * Hello, FBI, #3 was a joke. I wish #4 was a joke, too. > > --00151750da44a7631704a3959aef Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey Tim,

Hope everything is good with you.=A0 Looks like you're = having some fun with hadoop.=A0

>Can anyone enlighten me? Why is= dfs.*.dir default to /tmp a good idea?
It's not a good idea, its ju= st how it defaults.=A0 You'll find hundreds or probably thousands of th= ese quirks as you work with Apache/Cloudera hadoop distributions.=A0 Never = trust the defaults.

> submitted a JIRA
That's the way to do it.

>which = appears to have been resolved ... but it does feel somewhat dissatisfying, = since by the time you see the WARNING your cluster is already useless/dead.=
And that's why, if it's relevant to you, you're best bet is to = resolve the JIRA yourself.=A0 Most of the contributors are big picture type= s who would look at "small" usability issues like this and scoff = about "newbies".=A0 Of course, by the time you're familiar en= ough with Hadoop and comfortable enough to fix your own JIRA's, you mig= ht also join the ranks of jaded contributor who scoffs ad usability issues = logged by newbies.

Case in point, I noted a while ago that when you run the namenode -form= at command, it only accepts a capital Y (or lower case, can't remember)= , and it fails silently if you give the wrong case.=A0 I didn't particu= larly care enough to fix it, having already learned my lesson.=A0 You'l= l find lots of these rough edges through hadoop, it is not a user firendly,= out-of-the-box enterprise-ready product.



On Wed, May 18, 2011 at 4:41 PM, Tim= e Less <time= lessness@gmail.com> wrote:
Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? I= 9;d rather, in order of preference, have the following behaviours if dfs.*.= dir are undefined:
  1. Daemons log errors and fail to start at all,<= /li>
  2. Daemons start but default to /var/db/hadoop (or any persistent location= ), meanwhile logging in huge screaming all-caps letters that it's picke= d a default which may not be optimal,
  3. Daemons start botnet and DDOS= random government websites, wait 36 hours, then phone the FBI and blame ad= ministrator for it*,
  4. Daemons write "persistent" data into /tmp without any gr= eat fanfare, allowing a sense of complacency in its victims, only to report= at a random time in the future that everything is corrupted beyond repair,= ie current behaviour.
I submitted a JIRA (which appears to have been resolved, yay!) to= at least add verbiage to the WARNING letting you know why you've irrev= ersibly corrupted your cluster, but it does feel somewhat dissatisfying, si= nce by the time you see the WARNING your cluster is already useless/dead.
It's not quite what you're asking f= or, but your NameNode's web interface should
provide a merged dump of all the relevant config settings, including commen= ts
indicating the name of the config file where the setting was defined, at th= e
/conf path.

Cool, though it looks like that's= just the NameNode's config, right? Not the DataNode's config, whic= h is the component corrupting data due to this default?

--
Tim Ellis
Riot Games
* Hello, FBI, #3 was a joke. I = wish #4 was a joke, too.


--00151750da44a7631704a3959aef--