hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Sreekumar <hsreeku...@clickable.com>
Subject Re: Pending written data get lost after namenode restart
Date Wed, 24 Nov 2010 12:49:27 GMT
Hi,
    Well in any case I checked SequenceFile.Writer and it doesn't even have
a flush() method. How big is the file? If it is > block size, you can check
UI if it is getting written at the time the code is running or if it gets
written only after writer.close() is called. I am myself new to this, not
very sure of the internal working. Maybe it writes to a buffer internally? I
am not sure.

hari

On Wed, Nov 24, 2010 at 5:57 PM, Qing Yan <qingyan@gmail.com> wrote:

> If I don't restart NN, file will be committed successfully (with non zero
> size).
>
> I am assuming writer.close() is performing flush implicitly, and in case of
> failure it will throw exception.
>
> On Wed, Nov 24, 2010 at 6:28 PM, Hari Sreekumar <hsreekumar@clickable.com
> >wrote:
>
> > Hi,
> >
> >      Two things I'd like to check here - 1. Is the file actually getting
> > written? You can check this in the web UI or you can do hadoop dfs -ls on
> > that path in a different shell. The file size should keep increasing
> while
> > the file is being written?
> > 2. You could try writer.flush() after writing to make sure it is
> committed
> > to the disk.
> >
> > regards,
> > Hari
> >
> > On Wed, Nov 24, 2010 at 2:44 PM, Qing Yan <qingyan@gmail.com> wrote:
> >
> > > Hi,
> > >   I found some erratic behavior in hadoop 0.19.2, here is a simple test
> > > program:
> > >
> > > import org.apache.hadoop.conf.Configuration;
> > > import org.apache.hadoop.fs.*;
> > > import org.apache.hadoop.io.*;
> > > public class test {
> > >        public static void main(String args[]) throws Exception {
> > >                String url=args[0];
> > >                String path=args[1];
> > >
> > >                Configuration conf;
> > >                FileSystem fs;
> > >                conf = new Configuration();
> > >                conf.set("fs.default.name",url);
> > >                conf.set("dfs.replication","3");
> > >                fs=FileSystem.get(conf);
> > >                System.out.println("open hdfs file "+path);
> > >                SequenceFile.Writer writer=
> SequenceFile.createWriter(fs,
> > > conf, new Path(path),
> > > LongWritable.class,Text.class,SequenceFile.CompressionType.BLOCK);
> > >                System.out.println("append some data");
> > >                for (int j=0;j<10000000*2;j++)
> > >                  writer.append(new LongWritable(0), new Text("abc"));
> > >                System.out.println("wait two min");
> > >                Thread.currentThread().sleep(2*60*1000);
> > >                System.out.println("close");
> > >                writer.close();
> > >        }
> > > }
> > >
> > > First run
> > >  java test hdfs://[host]:[port] [hdfs_path]
> > >
> > >  After the program print "wait two min" restart Hadoop namenode by
> > > executing:
> > >
> > > hadoop-daemon.sh stop namenode
> > > hadoop-daemon.sh start namenode
> > >
> > > (In my case two mins is enough for NN to get over the safe mode)
> > > The sequence file will get closed instantly with no exception thrown.
> > > The problem is the final HDFS file size is zero!? whatver get written
> > > before
> > > NN restart seems get lost.
> > >
> > > Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message