hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qing Yan <qing...@gmail.com>
Subject Re: Pending written data get lost after namenode restart
Date Thu, 25 Nov 2010 02:06:54 GMT
The flush() equivalent method is called sync(), close() should do an
implicit sync()..
the issue here is the restarted NN seems forget the file allocation metadata
which is supposed to be logged and persistent.

On Wed, Nov 24, 2010 at 8:49 PM, Hari Sreekumar <hsreekumar@clickable.com>wrote:

> Hi,
>    Well in any case I checked SequenceFile.Writer and it doesn't even have
> a flush() method. How big is the file? If it is > block size, you can check
> UI if it is getting written at the time the code is running or if it gets
> written only after writer.close() is called. I am myself new to this, not
> very sure of the internal working. Maybe it writes to a buffer internally?
> I
> am not sure.
>
> hari
>
> On Wed, Nov 24, 2010 at 5:57 PM, Qing Yan <qingyan@gmail.com> wrote:
>
> > If I don't restart NN, file will be committed successfully (with non zero
> > size).
> >
> > I am assuming writer.close() is performing flush implicitly, and in case
> of
> > failure it will throw exception.
> >
> > On Wed, Nov 24, 2010 at 6:28 PM, Hari Sreekumar <
> hsreekumar@clickable.com
> > >wrote:
> >
> > > Hi,
> > >
> > >      Two things I'd like to check here - 1. Is the file actually
> getting
> > > written? You can check this in the web UI or you can do hadoop dfs -ls
> on
> > > that path in a different shell. The file size should keep increasing
> > while
> > > the file is being written?
> > > 2. You could try writer.flush() after writing to make sure it is
> > committed
> > > to the disk.
> > >
> > > regards,
> > > Hari
> > >
> > > On Wed, Nov 24, 2010 at 2:44 PM, Qing Yan <qingyan@gmail.com> wrote:
> > >
> > > > Hi,
> > > >   I found some erratic behavior in hadoop 0.19.2, here is a simple
> test
> > > > program:
> > > >
> > > > import org.apache.hadoop.conf.Configuration;
> > > > import org.apache.hadoop.fs.*;
> > > > import org.apache.hadoop.io.*;
> > > > public class test {
> > > >        public static void main(String args[]) throws Exception {
> > > >                String url=args[0];
> > > >                String path=args[1];
> > > >
> > > >                Configuration conf;
> > > >                FileSystem fs;
> > > >                conf = new Configuration();
> > > >                conf.set("fs.default.name",url);
> > > >                conf.set("dfs.replication","3");
> > > >                fs=FileSystem.get(conf);
> > > >                System.out.println("open hdfs file "+path);
> > > >                SequenceFile.Writer writer=
> > SequenceFile.createWriter(fs,
> > > > conf, new Path(path),
> > > > LongWritable.class,Text.class,SequenceFile.CompressionType.BLOCK);
> > > >                System.out.println("append some data");
> > > >                for (int j=0;j<10000000*2;j++)
> > > >                  writer.append(new LongWritable(0), new Text("abc"));
> > > >                System.out.println("wait two min");
> > > >                Thread.currentThread().sleep(2*60*1000);
> > > >                System.out.println("close");
> > > >                writer.close();
> > > >        }
> > > > }
> > > >
> > > > First run
> > > >  java test hdfs://[host]:[port] [hdfs_path]
> > > >
> > > >  After the program print "wait two min" restart Hadoop namenode by
> > > > executing:
> > > >
> > > > hadoop-daemon.sh stop namenode
> > > > hadoop-daemon.sh start namenode
> > > >
> > > > (In my case two mins is enough for NN to get over the safe mode)
> > > > The sequence file will get closed instantly with no exception thrown.
> > > > The problem is the final HDFS file size is zero!? whatver get written
> > > > before
> > > > NN restart seems get lost.
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message