Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 83618 invoked from network); 19 Mar 2011 02:15:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Mar 2011 02:15:51 -0000 Received: (qmail 28182 invoked by uid 500); 19 Mar 2011 02:15:50 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 28138 invoked by uid 500); 19 Mar 2011 02:15:50 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 28130 invoked by uid 99); 19 Mar 2011 02:15:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2011 02:15:50 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [193.1.169.37] (HELO cali.ucd.ie) (193.1.169.37) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2011 02:15:44 +0000 Received: from conversion-daemon.cali.ucd.ie by cali.ucd.ie (Sun Java System Messaging Server 6.2-4.03 (built Sep 22 2005)) id <0LIA001018VGOO00@cali.ucd.ie> (original mail from viliam.holub@ucd.ie) for hdfs-user@hadoop.apache.org; Sat, 19 Mar 2011 02:15:22 +0000 (GMT) Received: from localhost ([85.70.198.193]) by cali.ucd.ie (Sun Java System Messaging Server 6.2-4.03 (built Sep 22 2005)) with ESMTPSA id <0LIA0085A8XLGI1J@cali.ucd.ie> for hdfs-user@hadoop.apache.org; Sat, 19 Mar 2011 02:15:22 +0000 (GMT) Date: Sat, 19 Mar 2011 03:15:05 +0100 From: Viliam Holub Subject: Re: Zero file size after hsync In-reply-to: <198725.57509.qm@web56203.mail.re3.yahoo.com> To: hdfs-user@hadoop.apache.org Message-id: <20110319021505.GB20226@uscale.ucd.ie> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Content-disposition: inline References: <20110318162932.GC16356@uscale.ucd.ie> <198725.57509.qm@web56203.mail.re3.yahoo.com> User-Agent: Mutt/1.5.20 (2009-06-14) Hi Nicholas, it's 0.21.0. In nutshell, one process collects measured data and saves them in a MapFile. It's low bandwidth - about 1 event/sec - and may run for days. I need recent data available for (independent) processing. One option is to close the MapFile. But then I have to open a new one since MapFile does not enable appending (unless I'm missing something). And then I end up with a lot of small files. I was hoping that calling hsync on the underlying data and index files say every 15 secs would to the trick. Apparently data are copied, but SequenceFile uses file size (reported as 0) for seek checks and therefore fails. getVisibleLength() is a great tip! SequenceFile tweaked to use it appears to be working. Thanks for help, Viliam On 18. Mar (Friday) v 09:55:15 -0700 2011, Tsz Wo (Nicholas), Sze wrote: > Hi Viliam, > > Which version of Hadoop are you using? > > First of all, hsyn is the same as hflush in 0.21 and above. hflush/hsync won't > update the file length on the NameNode. So the answer to your question is yes. > We have to call DFSDataInputStream.getVisibleLength() to get the visible length > of the file. > > When is the SequenceFile opened? Before or after hflush/hsync? Note that only > new reader can see the new data. So if the file, normal file or SequenceFile, > is opened before hflush/hsync, we have to re-open the file in order to see the > new data. > > Anyway, please feel free to file a JIRA if you feel it is a bug or you like to > have a feature request. > > Hope it helps. > Nicholas > > > > > ________________________________ > From: Viliam Holub > To: hdfs-user@hadoop.apache.org > Sent: Fri, March 18, 2011 9:29:32 AM > Subject: Zero file size after hsync > > > Hi all, > > size of a newly created file is reported to be zero even though I've written > some data and hsync-ed them. Is that correct and expected effect? > hadoop fs -cat will retrieve the data correctly. > > As a consequence SequenceFile fails to seek in the file since it tests the > position against file size. And data are there... > > Thanks! > Viliam