hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@gmail.com>
Subject Re: Read timed out, Abandoning block blk_-5476242061384228962
Date Mon, 12 May 2008 04:45:25 GMT
You bring up an interesting point. A big chunk of the code in the
Namenode is being done inside a global lock although there are pieces
(e.g. a portion of code that chooses datanodes for a newly allocated
block) that do execute outside this lock. But, it is probably the case
that the namenode does not benefit from more than 4 core or so (with
the current code).

If you have 8 cores, you can expriement with running map-reduce jobs
on the other 4 cores.

How much memory does your machine have and how many files does your
HDFS have? One possibility is that the memory pressure of the
map-reduce jobs causes more GC runs for the namenode process.

thanks,
dhruba


On Fri, May 9, 2008 at 7:54 PM, James Moore <jamesthepiper@gmail.com> wrote:
> On Fri, May 9, 2008 at 12:00 PM, Hairong Kuang <hairong@yahoo-inc.com> wrote:
>  >> I'm using the machine running the namenode to run maps as well.
>  > Please do not run maps on the machine that is running the namenode. This
>  > would cause CPU contention and slow down namenode. Thus more easily to see
>  > SocketTimeoutException.
>  >
>  > Hairong
>
>  I've turned off running tasks on the master, and I'm not seeing those errors.
>
>  The behavior was interesting.  On one job, I saw a total of 11 timeout
>  failures (where the map was reported as a failure), but all of them
>  happened in the first few minutes.  After that it worked well and
>  completed correctly.
>
>  I'm wondering if it's worth it, though.  If the number of maps/reduces
>  that the master machine can run is substantially greater than the
>  number of failures due to timeouts, isn't it worth having the master
>  run tasks?  It seems like there's probably a point where the number of
>  machines in the cluster makes having a separate master a requirement,
>  but at 20 8-core machines, it's not clear that dedicating a box to
>  being the master is a win.  (And having a smaller machine dedicated to
>  being the master is cheaper, but annoying.  I'd rather have N
>  identical boxes running the same AMI, etc.)
>
>  To anyone using amazon - definitely upgrade to the new kernels.  I now
>  have have very few instances of the 'Exception in
>  createBlockOutputStream' error that started this thread in my logs.
>  (These are different than the 11 timeouts I mentioned above, FYI).
>
>  The ones that are there all happened in one burst at  03:59:22 this afternoon:
>
>  james@domU-12-31-38-00-D1-B1:~/dev/hadoop$ bin/slaves.sh grep -r
>  'Exception in createBlockOutputStream' ~/dev/hadoop/logs/
>  domU-12-31-38-00-04-51.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000024_0/syslog:2008-05-09
>  03:59:22,713 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-D6-21.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000048_0/syslog:2008-05-09
>  03:59:22,989 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.22.111:50010
>  domU-12-31-38-00-D6-21.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000061_0/syslog:2008-05-09
>  03:59:22,398 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-60-D1.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000017_0/syslog:2008-05-09
>  03:59:22,880 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.217.203:50010
>  domU-12-31-38-00-CD-41.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000051_0/syslog:2008-05-09
>  03:59:23,012 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.34.31:50010
>  domU-12-31-38-00-D5-E1.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000026_0/syslog:2008-05-09
>  03:59:24,551 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.15.47:50010
>  domU-12-31-38-00-1D-D1.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000056_0/syslog:2008-05-09
>  03:59:23,504 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.11.159:50010
>  domU-12-31-38-00-1D-D1.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000050_0/syslog:2008-05-09
>  03:59:22,454 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-1D-D1.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000009_0/syslog:2008-05-09
>  03:59:22,944 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-D8-81.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000002_0/syslog:2008-05-09
>  03:59:22,420 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-D8-81.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000072_0/syslog:2008-05-09
>  03:59:22,318 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>  domU-12-31-38-00-08-C1.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000021_0/syslog:2008-05-09
>  03:59:24,150 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.22.111:50010
>  domU-12-31-38-00-C9-51.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000045_0/syslog:2008-05-09
>  03:59:24,470 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.IOException: Bad connect ack with
>  firstBadLink 10.252.22.111:50010
>  domU-12-31-38-00-C9-51.compute-1.internal:
>  /home/james/dev/hadoop/logs/userlogs/task_200805082159_0043_r_000055_0/syslog:2008-05-09
>  03:59:21,588 INFO org.apache.hadoop.dfs.DFSClient: Exception in
>  createBlockOutputStream java.io.EOFException
>
>
>  --
>  James Moore | james@restphone.com
>  blog.restphone.com
>

Mime
View raw message