hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy <snickerdoodl...@gmail.com>
Subject Re: datanode not being started
Date Mon, 16 Feb 2009 17:47:23 GMT
Hi Rasit,

Thanks for your response!

I saw the previous threads by Jerro and Mithila, but I think my problem is
slightly different. My datanodes are not being started, period. From a
previous thread:

"The common reasons for this case are configuration errors, installation
errors, or network connectivity issues due to firewalls blocking ports, or
dns lookup errors (either failure or incorrect address returned) for the
namenode hostname on the datanodes."

I'm going to reinstall hadoop once again on this machine (this will be the
third reinstall for this problem), but it's hard for me to believe that it's
configuration and/or installation. The configuration and everything worked
fine the last time I used this machine. If anything were to happen, the HDFS
would have gotten corrupted, and a reformat should have fixed it. I tried
checking the logs for the datanode, but there is nothing there. I can ssh
into localhost and my server name fine.. but I can see if there are further
problems with DNS or firewall.

Since I last used this machine, Parallels Desktop was installed by the
admin. I am currently suspecting that somehow this is interfering with the
function of Hadoop  (though Java_HOME still seems to be ok). Has anyone had
any experience with this being a cause of interference?

Thanks,
-SM

On Mon, Feb 16, 2009 at 2:32 AM, Rasit OZDAS <rasitozdas@gmail.com> wrote:

> Sandy, as far as I remember, there were some threads about the same
> problem (I don't know if it's solved). Searching the mailing list for
> this error: "could only be replicated to 0 nodes, instead of 1" may
> help.
>
> Cheers,
> Rasit
>
> 2009/2/16 Sandy <snickerdoodle08@gmail.com>:
> > just some more information:
> > hadoop fsck produces:
> > Status: HEALTHY
> >  Total size: 0 B
> >  Total dirs: 9
> >  Total files: 0 (Files currently being written: 1)
> >  Total blocks (validated): 0
> >  Minimally replicated blocks: 0
> >  Over-replicated blocks: 0
> >  Under-replicated blocks: 0
> >  Mis-replicated blocks: 0
> >  Default replication factor: 1
> >  Average block replication: 0.0
> >  Corrupt blocks: 0
> >  Missing replicas: 0
> >  Number of data-nodes: 0
> >  Number of racks: 0
> >
> >
> > The filesystem under path '/' is HEALTHY
> >
> > on the newly formatted hdfs.
> >
> > jps says:
> > 4723 Jps
> > 4527 NameNode
> > 4653 JobTracker
> >
> >
> > I can't copy files onto the dfs since I get "NotReplicatedYetExceptions",
> > which I suspect has to do with the fact that there are no datanodes. My
> > "cluster" is a single MacPro with 8 cores. I haven't had to do anything
> > extra before in order to get datanodes to be generated.
> >
> > 09/02/15 15:56:27 WARN dfs.DFSClient: Error Recovery for block null bad
> > datanode[0]
> > copyFromLocal: Could not get block locations. Aborting...
> >
> >
> > The corresponding error in the logs is:
> >
> > 2009-02-15 15:56:27,123 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 1 on 9000, call addBlock(/user/hadoop/input/.DS_Store,
> > DFSClient_755366230) from 127.0.0.1:49796: error: java.io.IOException:
> File
> > /user/hadoop/input/.DS_Store could only be replicated to 0 nodes, instead
> of
> > 1
> > java.io.IOException: File /user/hadoop/input/.DS_Store could only be
> > replicated to 0 nodes, instead of 1
> > at
> >
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1120)
> > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> >
> > On Sun, Feb 15, 2009 at 3:26 PM, Sandy <snickerdoodle08@gmail.com>
> wrote:
> >
> >> Thanks for your responses.
> >>
> >> I checked in the namenode and jobtracker logs and both say:
> >>
> >> INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000, call
> >> delete(/Users/hadoop/hadoop-0.18.2/hadoop-hadoop/mapred/system, true)
> from
> >> 127.0.0.1:61086: error: org.apache.hadoop.dfs.SafeModeException: Cannot
> >> delete /Users/hadoop/hadoop-0.18.2/hadoop-hadoop/mapred/system. Name
> node
> >> is in safe mode.
> >> The ratio of reported blocks 0.0000 has not reached the threshold
> 0.9990.
> >> Safe mode will be turned off automatically.
> >> org.apache.hadoop.dfs.SafeModeException: Cannot delete
> >> /Users/hadoop/hadoop-0.18.2/hadoop-hadoop/mapred/system. Name node is in
> >> safe mode.
> >> The ratio of reported blocks 0.0000 has not reached the threshold
> 0.9990.
> >> Safe mode will be turned off automatically.
> >>         at
> >>
> org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1505)
> >>         at
> >> org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1477)
> >>         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:425)
> >>         at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> >>         at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>         at java.lang.reflect.Method.invoke(Method.java:597)
> >>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
> >>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> >>
> >>
> >> I think this is a continuation of my running problem. The nodes stay in
> >> safe mode, but won't come out, even after several minutes. I believe
> this is
> >> due to the fact that it keep trying to contact a datanode that does not
> >> exist. Any suggestions on what I can do?
> >>
> >> I have recently tried to reformat the hdfs, using bin/hadoop namenode
> >> -format. From the output directed to standard out, I thought this
> completed
> >> correctly:
> >>
> >> Re-format filesystem in
> /Users/hadoop/hadoop-0.18.2/hadoop-hadoop/dfs/name
> >> ? (Y or N) Y
> >> 09/02/15 15:16:39 INFO fs.FSNamesystem:
> >>
> fsOwner=hadoop,staff,_lpadmin,com.apple.sharepoint.group.8,com.apple.sharepoint.group.3,com.apple.sharepoint.group.4,com.apple.sharepoint.group.2,com.apple.sharepoint.group.6,com.apple.sharepoint.group.9,com.apple.sharepoint.group.1,com.apple.sharepoint.group.5
> >> 09/02/15 15:16:39 INFO fs.FSNamesystem: supergroup=supergroup
> >> 09/02/15 15:16:39 INFO fs.FSNamesystem: isPermissionEnabled=true
> >> 09/02/15 15:16:39 INFO dfs.Storage: Image file of size 80 saved in 0
> >> seconds.
> >> 09/02/15 15:16:39 INFO dfs.Storage: Storage directory
> >> /Users/hadoop/hadoop-0.18.2/hadoop-hadoop/dfs/name has been successfully
> >> formatted.
> >> 09/02/15 15:16:39 INFO dfs.NameNode: SHUTDOWN_MSG:
> >> /************************************************************
> >> SHUTDOWN_MSG: Shutting down NameNode at
> >> loteria.cs.tamu.edu/128.194.143.170
> >> ************************************************************/
> >>
> >> However, after reformatting, I find that I have the same problems.
> >>
> >> Thanks,
> >> SM
> >>
> >> On Fri, Feb 13, 2009 at 5:39 PM, james warren <james@rockyou.com>
> wrote:
> >>
> >>> Sandy -
> >>>
> >>> I suggest you take a look into your NameNode and DataNode logs.  From
> the
> >>> information posted, these likely would be at
> >>>
> >>>
> >>>
> /Users/hadoop/hadoop-0.18.2/bin/../logs/hadoop-hadoop-namenode-loteria.cs.tamu.edu.log
> >>>
> >>>
> /Users/hadoop/hadoop-0.18.2/bin/../logs/hadoop-hadoop-jobtracker-loteria.cs.tamu.edu.log
> >>>
> >>> If the cause isn't obvious from what you see there, could you please
> post
> >>> the last few lines from each log?
> >>>
> >>> -jw
> >>>
> >>> On Fri, Feb 13, 2009 at 3:28 PM, Sandy <snickerdoodle08@gmail.com>
> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > I would really appreciate any help I can get on this! I've suddenly
> ran
> >>> > into
> >>> > a very strange error.
> >>> >
> >>> > when I do:
> >>> > bin/start-all
> >>> > I get:
> >>> > hadoop$ bin/start-all.sh
> >>> > starting namenode, logging to
> >>> >
> >>> >
> >>>
> /Users/hadoop/hadoop-0.18.2/bin/../logs/hadoop-hadoop-namenode-loteria.cs.tamu.edu.out
> >>> > starting jobtracker, logging to
> >>> >
> >>> >
> >>>
> /Users/hadoop/hadoop-0.18.2/bin/../logs/hadoop-hadoop-jobtracker-loteria.cs.tamu.edu.out
> >>> >
> >>> > No datanode, secondary namenode or jobtracker are being started.
> >>> >
> >>> > When I try to upload anything on the dfs, I get a "node in safemode"
> >>> error
> >>> > (even after waiting 5 minutes), presumably because it's trying to
> reach
> >>> a
> >>> > datanode that does not exist.  The same "safemode" error occurs when
> I
> >>> try
> >>> > to run jobs.
> >>> >
> >>> > I have tried bin/stop-all and then bin/start-all again. I get the
> same
> >>> > problem!
> >>> >
> >>> > This is incredibly strange, since I was previously able to start and
> run
> >>> > jobs without any issue using this version on this machine. I am
> running
> >>> > jobs
> >>> > on a single Mac Pro running OS X 10.5
> >>> >
> >>> > I have tried updating to hadoop-0.19.0, and I get the same problem.
I
> >>> have
> >>> > even tried this using previous versions, and I'm getting the same
> >>> problem!
> >>> >
> >>> > Anyone have any idea why this suddenly could be happening? What am
I
> >>> doing
> >>> > wrong?
> >>> >
> >>> > For convenience, I'm including portions of both conf/hadoop-env.sh
> and
> >>> > conf/hadoop-site.xml:
> >>> >
> >>> > --- hadoop-env.sh ---
> >>> >  # Set Hadoop-specific environment variables here.
> >>> >
> >>> > # The only required environment variable is JAVA_HOME.  All others
> are
> >>> > # optional.  When running a distributed configuration it is best to
> >>> > # set JAVA_HOME in this file, so that it is correctly defined on
> >>> > # remote nodes.
> >>> >
> >>> > # The java implementation to use.  Required.
> >>> >  export
> >>> >
> >>>
> JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home
> >>> >
> >>> > # Extra Java CLASSPATH elements.  Optional.
> >>> > # export HADOOP_CLASSPATH=
> >>> >
> >>> > # The maximum amount of heap to use, in MB. Default is 1000.
> >>> >  export HADOOP_HEAPSIZE=3000
> >>> > ...
> >>> > --- hadoop-site.xml ---
> >>> > <configuration>
> >>> >
> >>> > <property>
> >>> >  <name>hadoop.tmp.dir</name>
> >>> >  <value>/Users/hadoop/hadoop-0.18.2/hadoop-${user.name}</value>
> >>> >  <description>A base for other temporary directories.</description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >  <name>fs.default.name</name>
> >>> >  <value>hdfs://localhost:9000</value>
> >>> >  <description>The name of the default file system.  A URI whose
> >>> >  scheme and authority determine the FileSystem implementation.  The
> >>> >  uri's scheme determines the config property (fs.SCHEME.impl) naming
> >>> >  the FileSystem implementation class.  The uri's authority is used
to
> >>> >  determine the host, port, etc. for a filesystem.</description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> >  <name>mapred.job.tracker</name>
> >>> >  <value>localhost:9001</value>
> >>> >  <description>The host and port that the MapReduce job tracker
runs
> >>> >  at.  If "local", then jobs are run in-process as a single map
> >>> >  and reduce task.
> >>> >  </description>
> >>> > </property>
> >>> >
> >>> > <property>
> >>> > <name>mapred.tasktracker.tasks.maximum</name>
> >>> > <value>1</value>
> >>> > <description>The maximum number of tasks that will be run
> simultaneously
> >>> by
> >>> > a
> >>> > a task tracker
> >>> > </description>
> >>> > </property>
> >>> > ...
> >>> >
> >>>
> >>
> >>
> >
>
>
>
> --
> M. Raşit ÖZDAŞ
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message