Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E9CAFE8EC for ; Sat, 23 Feb 2013 14:54:49 +0000 (UTC) Received: (qmail 9890 invoked by uid 500); 23 Feb 2013 14:54:45 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 9782 invoked by uid 500); 23 Feb 2013 14:54:44 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 9758 invoked by uid 99); 23 Feb 2013 14:54:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Feb 2013 14:54:44 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hemanty@thoughtworks.com designates 64.18.0.145 as permitted sender) Received: from [64.18.0.145] (HELO exprod5og103.obsmtp.com) (64.18.0.145) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Feb 2013 14:54:37 +0000 Received: from mail-ob0-f197.google.com ([209.85.214.197]) (using TLSv1) by exprod5ob103.postini.com ([64.18.4.12]) with SMTP ID DSNKUSjYGG64+g0hy7jkc/n5wyq9NXMH5pZG@postini.com; Sat, 23 Feb 2013 06:54:17 PST Received: by mail-ob0-f197.google.com with SMTP id ta14so7704587obb.4 for ; Sat, 23 Feb 2013 06:54:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=oPDTiiKpm6TSwqBm4GpmJGWLWroZVQY0q2eGYBuW8Zo=; b=HIpJnJ594uhn/H2PgZEkPKBNdXUfo52NWHmCu+jUvEbaewOAu9ugxnQb19YTMahh6m uTXSMkw+jHeN5Sx5yr8QZR6MCmQt4PjKJs6ar1n9v64hgnqAqkKHVZE4ntvtD6me6WEa Q/pNRlmMhi2lF5UIbwDcvlKaeNHJ5yKNfnqj0b8o2WdPxqVfMZdpNdh7hNiErxIjM4f/ UcchBo1RyGpIY/QOsckgp7ijHGem5UoBL97Rvk2mwH8hACsU35WDvucfXxZld3OlB4Ms kUo3TThwpi+T9WnwekR6Hs70c7+pX08m39jb/n0JI9kQN7EYz48N0mYqIcte6z1R0bP6 0vug== X-Received: by 10.60.12.137 with SMTP id y9mr2192805oeb.88.1361631256416; Sat, 23 Feb 2013 06:54:16 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.60.12.137 with SMTP id y9mr2192796oeb.88.1361631256229; Sat, 23 Feb 2013 06:54:16 -0800 (PST) Received: by 10.76.22.45 with HTTP; Sat, 23 Feb 2013 06:54:15 -0800 (PST) In-Reply-To: References: Date: Sat, 23 Feb 2013 20:24:15 +0530 Message-ID: Subject: Re: map reduce and sync From: Hemanth Yamijala To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=e89a8fb2024457b5cf04d6657bab X-Gm-Message-State: ALoCoQkrnqInWC6tjdyI/PQyqDt7Xdt76XfM1sPTcq7PhTQ0UlY85JRs0XBxtpn0TfVYXYnhuWZUI4uaK/IU2JC7K4dL3MV/b/bPsM43Jzv/WmRb8xPcBR1qXbR29woE1w9SuXf0rklc056Ms/yGNT6HQ0XvP++vMg== X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb2024457b5cf04d6657bab Content-Type: text/plain; charset=ISO-8859-1 Hi Lucas, I tried something like this but got different results. I wrote code that opened a file on HDFS, wrote a line and called sync. Without closing the file, I ran a wordcount with that file as input. It did work fine and was able to count the words that were sync'ed (even though the file length seems to come as 0 like you noted in fs -ls) So, not sure what's happening in your case. In the MR job, do the job counters indicate no bytes were read ? On a different note though, if you can describe a little more what you are trying to accomplish, we could probably work a better solution. Thanks hemanth On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi wrote: > Helo Hemanth, thanks for answering. > The file is open by a separate process not map reduce related at all. You > can think of it as a servlet, receiving requests, and writing them to this > file, every time a request is received it is written and > org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked. > > At the same time, I want to run a map reduce job over this file. Simply > runing the word count example doesn't seem to work, it is like if the file > were empty. > > hadoop -fs -tail works just fine, and reading the file using > org.apache.hadoop.fs.FSDataInputStream also works ok. > > Last thing, the web interface doesn't see the contents, and command hadoop > -fs -ls says the file is empty. > > What am I doing wrong? > > Thanks! > > Lucas > > > > On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala < > yhemanth@thoughtworks.com> wrote: > >> Could you please clarify, are you opening the file in your mapper code >> and reading from there ? >> >> Thanks >> Hemanth >> >> On Friday, February 22, 2013, Lucas Bernardi wrote: >> >>> Hello there, I'm trying to use hadoop map reduce to process an open >>> file. The writing process, writes a line to the file and syncs the file >>> to readers. >>> (org.apache.hadoop.fs.FSDataOutputStream.sync()). >>> >>> If I try to read the file from another process, it works fine, at least >>> using >>> org.apache.hadoop.fs.FSDataInputStream. >>> >>> hadoop -fs -tail also works just fine >>> >>> But it looks like map reduce doesn't read any data. I tried using the >>> word count example, same thing, it is like if the file were empty for the >>> map reduce framework. >>> >>> I'm using hadoop 1.0.3. and pig 0.10.0 >>> >>> I need some help around this. >>> >>> Thanks! >>> >>> Lucas >>> >> > --e89a8fb2024457b5cf04d6657bab Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Lucas,

I tried something like = this but got different results.

I wrot= e code that opened a file on HDFS, wrote a line and called sync. Without cl= osing the file, I ran a wordcount with that file as input. It did work fine= and was able to count the words that were sync'ed (even though the fil= e length seems to come as 0 like you noted in fs -ls)

So, not sure what's happening in your c= ase. In the MR job, do the job counters indicate no bytes were read ?
=

On a different note though, if you can desc= ribe a little more what you are trying to accomplish, we could probably wor= k a better solution.

Thanks
hemanth
<= div class=3D"gmail_extra">

On Sat, Feb 23= , 2013 at 7:15 PM, Lucas Bernardi <lucejb@gmail.com> wrote:
Helo Hemanth, thanks for answering.
The = file is open by a separate process not map reduce related at all. You can t= hink of it as a servlet, receiving requests, and writing them to this file,= every time a request is received it is written and=A0org.apache.hadoo= p.fs.FSDataOutputStream.sync() is invoked.

At the same time, I want to run a map reduce job over this file. Si= mply runing the word count example doesn't seem to work, it is like if = the file were empty.

hadoop -fs -tail works just fine, and reading the f= ile using org.apache.hadoop.fs.FSDataInputStream also works ok.

Last thing, the web interface doesn't see the conte= nts, and command hadoop -fs -ls says the file is empty.

What am I doing wrong?

Thanks!

Lucas



On Sat, Feb 23, 2013 at 4:37 AM,= Hemanth Yamijala <yhemanth@thoughtworks.com> wrote:=
Could you please clarify, are you opening th= e file in your mapper code and reading from there ?

Than= ks
Hemanth
=

On Friday, February 22, 2013, Lucas Bernardi wrote:
Hello there, I'm trying to use hado= op map reduce to process an open file. The writing process, wri= tes a line to the file and syncs the file to readers.
(org.apache.hadoop.fs.FSDataOutputStream.sync()).

If I try to read the file from another process, it works fine, at least usi= ng=A0
org.apache.hadoop.fs.FSDataInputStream.

hadoop -fs -tail also works just fine

But it looks like map reduce doesn't read any data. I tried using the w= ord count example, same thing, it is like if the file were empty for the ma= p reduce framework.

I'm using hadoop 1.0.3. and pig 0.10.0

I need some help around this.

Thanks!

Lucas


--e89a8fb2024457b5cf04d6657bab--