hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Khandelwal <nitin.khandel...@germinait.com>
Subject Hadoop Question
Date Thu, 28 Jul 2011 05:51:57 GMT
Hi All,

How can I determine if a file is being written to (by any thread) in HDFS. I
have a continuous process on the master node, which is tracking a particular
folder in HDFS for files to process. On the slave nodes, I am creating files
in the same folder using the following code :

At the slave node:

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.fs.FileSystem;
import java.io.OutputStream;

OutputStream oStream = fileSystem.create(path);
IOUtils.write(<Some String>, oStream);

At the master node,
I am getting the earliest modified file in the folder. At times when I try
reading the file, I get nothing in the file, mostly because the slave might
be still finishing writing to the file. Is there any way, to somehow tell
the master, that the slave is still writing to the file and to check the
file sometime later for actual content.


Nitin Khandelwal

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message