hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Kerzner <markkerz...@gmail.com>
Subject Re: HDFS behaving strangely
Date Tue, 26 Jan 2010 03:16:08 GMT
You may be facing the other well-known problem in Hadoop - don't use many
small files:

http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/

On Mon, Jan 25, 2010 at 7:38 PM, Ben Hardy <benhardy@gmail.com> wrote:

> For me the cause of this problem turned out to be a bug in Linux 2.6.21,
> which is used in the default Elastic MapReduce AMI we run on c1-mediums.
>
> What was going on in the files that I uploaded is that in one particular
> directory with 15,000 odd files in it, some of the files were appearing in
> output filesystem commands like find and ls TWICE. Really weird. So when
> hadoop tried to copy these files into HDFS it quite rightly complained that
> it had seen that file before.
>
> Even though all my filenames are unique.
>
> So watch out for that one folks, it's a doozy, and it's not a Hadoop bug,
> but still might bite you.
>
> -b
>
> On Mon, Jan 25, 2010 at 11:08 AM, Mark Kerzner <markkerzner@gmail.com
> >wrote:
>
> > I hit this error in -copyFromLocal, or a similar one, all the time. It is
> > also found in .19 and .20.
> >
> > One can work around manually. For example, copy the file to a different
> > place in HDFS, remove the offending file in HDFS, and rename your file
> into
> > the problem one. This works, and after this I have no problem.
> >
> > The funny thing is that it happens for specific file names, only a few.
> For
> > example, job.prop always gives a problem, whereas job.properties does
> not.
> >
> > If I were a good boy, I would debug it with "job.prop" file, but of
> course
> > I
> > just found a workaround and forgot about it.
> >
> > Sincerely,
> > Mark
> >
> > On Mon, Jan 25, 2010 at 1:01 PM, Ben Hardy <benhardy@gmail.com> wrote:
> >
> > > Hey folks,
> > >
> > > We're running a 100 node cluster on Hadoop 0.18.3 using Amazon Elastic
> > > MapReduce.
> > >
> > > We've been uploading data to this cluster via SCP and using hadoop fs
> > > -copyFromLocal to get it into HDFS.
> > >
> > > Generally this works fine but our last run saw a failure in this
> > operation
> > > which only said "RuntimeError".
> > >
> > > So we blew away the destination directory in HDFS and tried the
> > > copyFromLocal again.
> > >
> > > This time it failed because it thinks one of the files it's trying to
> > copy
> > > to HDFS is already in HDFS, however, I don't get how this is possible
> if
> > we
> > > just blew away the destination's parent directory. Subequent attepts
> > result
> > > in identical results.
> > >
> > > hadoop fsck reports a HEALTHY filesystem.
> > >
> > > We do see a lot of errors like those below in the namenode log. Are
> these
> > > normal, or perhaps related to the problem described above?
> > >
> > > Would appreciate any advice or suggestions.
> > >
> > > b
> > >
> > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> > Server
> > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > > blk_-3060969094589165545 is added to invalidSet of 10.245.103.240:9200
> > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> > Server
> > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > > blk_-3060969094589165545 is added to invalidSet of 10.242.25.206:9200
> > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> > Server
> > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > > blk_5935615666845780861 is added to invalidSet of 10.242.15.111:9200
> > > 2010-01-25 16:34:19,762 INFO org.apache.hadoop.dfs.StateChange (IPC
> > Server
> > > handler 12 on 9000): BLOCK* NameSystem.addToInvalidates:
> > > blk_5935615666845780861 is added to invalidSet of 10.244.107.18:9200
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message