Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Date: Sat, 19 Mar 2011 03:15:05 +0100
From: Viliam Holub <viliam.holub@ucd.ie>
Subject: Re: Zero file size after hsync
In-reply-to: <198725.57509.qm@web56203.mail.re3.yahoo.com>
To: hdfs-user@hadoop.apache.org
Message-id: <20110319021505.GB20226@uscale.ucd.ie>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT
Content-disposition: inline
References: <20110318162932.GC16356@uscale.ucd.ie>
 <198725.57509.qm@web56203.mail.re3.yahoo.com>
User-Agent: Mutt/1.5.20 (2009-06-14)


Hi Nicholas,

it's 0.21.0.

In nutshell, one process collects measured data and saves them in a MapFile.
It's low bandwidth - about 1 event/sec - and may run for days. I need recent
data available for (independent) processing.

One option is to close the MapFile. But then I have to open a new one since
MapFile does not enable appending (unless I'm missing something). And then I
end up with a lot of small files.

I was hoping that calling hsync on the underlying data and index files say
every 15 secs would to the trick. Apparently data are copied, but
SequenceFile uses file size (reported as 0) for seek checks and therefore
fails.

getVisibleLength() is a great tip! SequenceFile tweaked to use it appears to
be working.

Thanks for help,
Viliam

On 18. Mar (Friday) v 09:55:15 -0700 2011, Tsz Wo (Nicholas), Sze wrote:
> Hi Viliam,
> 
> Which version of Hadoop are you using?
> 
> First of all, hsyn is the same as hflush in 0.21 and above.  hflush/hsync won't 
> update the file length on the NameNode.  So the answer to your question is yes.  
> We have to call DFSDataInputStream.getVisibleLength() to get the visible length 
> of the file.
> 
> When is the SequenceFile opened?  Before or after hflush/hsync?  Note that only 
> new reader can see the new data.  So if the file, normal file or SequenceFile, 
> is opened before hflush/hsync, we have to re-open the file in order to see the 
> new data.
> 
> Anyway, please feel free to file a JIRA if you feel it is a bug or you like to 
> have a feature request.
> 
> Hope it helps.
> Nicholas
> 
> 
> 
> 
> ________________________________
> From: Viliam Holub <viliam.holub@ucd.ie>
> To: hdfs-user@hadoop.apache.org
> Sent: Fri, March 18, 2011 9:29:32 AM
> Subject: Zero file size after hsync
> 
> 
> Hi all,
> 
> size of a newly created file is reported to be zero even though I've written
> some data and hsync-ed them. Is that correct and expected effect?
> hadoop fs -cat will retrieve the data correctly.
> 
> As a consequence SequenceFile fails to seek in the file since it tests the
> position against file size. And data are there...
> 
> Thanks!
> Viliam