hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Knapp <s...@ooyala.com>
Subject Re: HDFS Namenode Heap Size woes
Date Mon, 02 Feb 2009 00:11:29 GMT
Jason,
Thanks for the response. By falling out, do you mean a longer time since
last contact (100s+), or fully timed out where it is dropped into dead
nodes? The former happens fairly often, the latter only under serious load
but not in the last day. Also, my namenode is now up to 10GB with less than
700k files after some additional archiving.

Thanks,
Sean

On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop <jason.hadoop@gmail.com> wrote:

> If your datanodes are pausing and falling out of the cluster you will get a
> large workload for the namenode of blocks to replicate and when the paused
> datanode comes back, a large workload of blocks to delete.
> These lists are stored in memory on the namenode.
> The startup messages lead me to wonder if your datanodes are periodically
> pausing or are otherwise dropping in and out of the cluster.
>
> On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp <sean@ooyala.com> wrote:
>
> > I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB). The
> > current status of my FS is approximately 1 million files and directories,
> > 950k blocks, and heap size of 7GB (16GB reserved). Average block
> > replication
> > is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB
> heap
> > is substantially higher per file that I have on a similar 0.18.2 cluster,
> > which has closer to a 1GB heap.
> > My typical usage model is 1) write a number of small files into HDFS
> (tens
> > or hundreds of thousands at a time), 2) archive those files, 3) delete
> the
> > originals. I've tried dropping the replication factor of the _index and
> > _masterindex files without much effect on overall heap size. While I had
> > trash enabled at one point, I've since disabled it and deleted the .Trash
> > folders.
> >
> > On namenode startup, I get a massive number of the following lines in my
> > log
> > file:
> > 2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> > NameSystem.processReport: block blk_-2389330910609345428_7332878 on
> > 172.16.129.33:50010 size 798080 does not belong to any file.
> > 2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> > NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
> > invalidSet
> > of 172.16.129.33:50010
> >
> > I suspect the original files may be left behind and causing the heap size
> > bloat. Is there any accounting mechanism to determine what is
> contributing
> > to my heap size?
> >
> > Thanks,
> > Sean
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message