Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 50D7D109F4 for ; Wed, 21 Aug 2013 00:57:48 +0000 (UTC) Received: (qmail 67124 invoked by uid 500); 21 Aug 2013 00:57:43 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 67063 invoked by uid 500); 21 Aug 2013 00:57:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 67056 invoked by uid 99); 21 Aug 2013 00:57:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 00:57:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shahab.yunus@gmail.com designates 209.85.214.49 as permitted sender) Received: from [209.85.214.49] (HELO mail-bk0-f49.google.com) (209.85.214.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 00:57:39 +0000 Received: by mail-bk0-f49.google.com with SMTP id r7so404172bkg.8 for ; Tue, 20 Aug 2013 17:57:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=cmxFZehUckwVr0dyTQ0Z3K+538mkoZfqmduhV/U1mks=; b=XZVyDCwm+ozFsJEhruODF+VlPiCBs0aVTFS1KkulbUhx+DTevoErPh5QEfTR4MQGit BuBUFHC2TcMwPf2PoKB+UxoVMX2oxruFKqhXIsEVO9+Rq5joWblg49CEsNwsS3eX22mj A7UcrLJsrDzhGk+GR3deu+tKPssSJL+iFNBvijQcNHdDDzIPu4/VYEyHxFIjv/YIvt5J 9K6HfafG8/IpdXP8+xfd2fDuHX/6OOotCHWtzvZjZ1o8gh1j6rNDo7KRha20OoAuRn57 01WS7MhjUGu0f/xmX3GSwy1fMMkmoyOREEetAqaFhRlz3LQFxKO9MZKexaOQ7qEgzJ+w vshw== MIME-Version: 1.0 X-Received: by 10.205.22.71 with SMTP id qv7mr3034493bkb.20.1377046637819; Tue, 20 Aug 2013 17:57:17 -0700 (PDT) Received: by 10.204.231.76 with HTTP; Tue, 20 Aug 2013 17:57:17 -0700 (PDT) In-Reply-To: <6678A26479D62E40927B180E42350D500335E741@EXGTMB19.nam.nsroot.net> References: <6678A26479D62E40927B180E42350D500335E741@EXGTMB19.nam.nsroot.net> Date: Tue, 20 Aug 2013 20:57:17 -0400 Message-ID: Subject: Re: read a changing hdfs file From: Shahab Yunus To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=20cf30223bf5afebba04e46aa755 X-Virus-Checked: Checked by ClamAV on apache.org --20cf30223bf5afebba04e46aa755 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable As far as I understand (and experts can correct me), the file being written will be visible once one HDFS block size worth of data is written. This applies to subsequent writing as well. Basically a block size worth of data is the level of coherency, the size/unit of data for which data durability is guaranteed. You can forcefully call the sync (*hsync/hflush) method to flush your writes to the file system so they become visible as you write them but then it has a cost in the form of lesser performance. So basically it is dependent on your application and requirements i.e. trade-off between performance and data visibility/durability. *Read more about the definition, differences and use of the appropriate method here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html Regards, Shahab On Tue, Aug 20, 2013 at 5:36 PM, Wu, Jiang2 wrote: > Hi, > > I did some experiments to read a changing hdfs file. It seems that the > reading takes a snapshot at the file opening moment, and will not read an= y > data appended to the file afterwards. It=92s different from what happens = when > reading a changing local file. My code is as follows > > Configuration conf =3D new Configuration(); > InputStream in =3D null; > try { > FileSystem fs =3D > FileSystem.get(URI.create("hdfs://MyCluster/"), > conf); > in =3D fs.open(new Path("/tmp/test.txt"))= ; > Scanner scanner=3Dnew Scanner(in); > while(scanner.hasNextLine()){ > > System.out.println("+++++++++++++++++++++++++++++++ read > "+scanner.nextLine()); > } > > System.out.println("+++++++++++++++++++++++++++++++ reader finished "); > } catch (IOException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } finally { > IOUtils.closeStream(in); > } > > I=92m wondering if this is the designed hdfs reading behavior, or can be > changed by using different API or configuration? What I expect is the sam= e > behavior as a local file reading: when a reader reads a file while anothe= r > writer is writing to the file, the reader will receive all data written b= y > the writer. > > Thanks, > Jiang > > > --20cf30223bf5afebba04e46aa755 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
As far as I understand (and experts can correct me), the f= ile being written will be visible once one HDFS block size worth of data is= written. This applies to subsequent writing as well. Basically a block siz= e worth of data is the level of coherency, the size/unit of data for which = data durability is guaranteed. You can forcefully call the sync (*hsync/hfl= ush) method to flush your writes to the file system so they become visible = as you write them but then it has a cost in the form of lesser performance.= So basically it is dependent on your application and requirements i.e. tra= de-off between performance and data visibility/durability.

*Read more about the definition, differences and use of the = appropriate method here:

Regards,
Shahab=A0


On Tue, Aug 20, 2013 a= t 5:36 PM, Wu, Jiang2 <jiang2.wu@citi.com> wrote:
Hi,
=A0
I did some experiments to read a changing hdfs file. It seems that the= reading takes a snapshot at the file opening moment, and will not read any= data appended to the file afterwards. It=92s different from what happens w= hen reading a changing local file. My code is as follows
=A0
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = Configuration conf =3D new Configuration();
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = InputStream in =3D null;
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = try {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 FileSystem fs =3D FileSystem.get(URI.create("= hdfs://MyCluster/"),
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 co= nf);
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 in =3D fs.open(new Path("/tmp/test.txt")= );
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 Scanner scanner=3Dnew Scanner(in);
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 while(scanner.hasNextLine()){
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 System.out.println("+= ++++++++++++++++++++++++++++++ read "+scanner.nextLine());
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 }
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 System.out.println("+++++++++++++++++++++++++= ++++++ reader finished ");
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = } catch (IOException e) {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 // TODO Auto-generated catch block
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 e.printStackTrace();
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = } finally {
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 IOUtils.closeStream(in);
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = }
=A0
I=92m wondering if this is the designed hdfs reading behavior, or can = be changed by using different API or configuration? What I expect is the sa= me behavior as a local file reading: when a reader reads a file while anoth= er writer is writing to the file, the reader will receive all data written by the writer.
=A0
Thanks,
Jiang
=A0
=A0

--20cf30223bf5afebba04e46aa755--