hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: read a changing hdfs file
Date Wed, 21 Aug 2013 00:57:17 GMT
As far as I understand (and experts can correct me), the file being written
will be visible once one HDFS block size worth of data is written. This
applies to subsequent writing as well. Basically a block size worth of data
is the level of coherency, the size/unit of data for which data durability
is guaranteed. You can forcefully call the sync (*hsync/hflush) method to
flush your writes to the file system so they become visible as you write
them but then it has a cost in the form of lesser performance. So basically
it is dependent on your application and requirements i.e. trade-off between
performance and data visibility/durability.

*Read more about the definition, differences and use of the appropriate
method here:


On Tue, Aug 20, 2013 at 5:36 PM, Wu, Jiang2 <jiang2.wu@citi.com> wrote:

>  Hi,
> I did some experiments to read a changing hdfs file. It seems that the
> reading takes a snapshot at the file opening moment, and will not read any
> data appended to the file afterwards. It’s different from what happens when
> reading a changing local file. My code is as follows
>                         Configuration conf = new Configuration();
>                         InputStream in = null;
>                         try {
>                                 FileSystem fs =
> FileSystem.get(URI.create("hdfs://MyCluster/"),
>                                                 conf);
>                                 in = fs.open(new Path("/tmp/test.txt"));
>                                 Scanner scanner=new Scanner(in);
>                                 while(scanner.hasNextLine()){
> System.out.println("+++++++++++++++++++++++++++++++ read
> "+scanner.nextLine());
>                                 }
> System.out.println("+++++++++++++++++++++++++++++++ reader finished ");
>                         } catch (IOException e) {
>                                 // TODO Auto-generated catch block
>                                 e.printStackTrace();
>                         } finally {
>                                 IOUtils.closeStream(in);
>                         }
> I’m wondering if this is the designed hdfs reading behavior, or can be
> changed by using different API or configuration? What I expect is the same
> behavior as a local file reading: when a reader reads a file while another
> writer is writing to the file, the reader will receive all data written by
> the writer.
> Thanks,
> Jiang

View raw message