Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6932710ECD for ; Thu, 13 Feb 2014 01:26:43 +0000 (UTC) Received: (qmail 25715 invoked by uid 500); 13 Feb 2014 01:26:35 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 25470 invoked by uid 500); 13 Feb 2014 01:26:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 25463 invoked by uid 99); 13 Feb 2014 01:26:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 01:26:35 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of xelllee@aim.com designates 205.188.109.204 as permitted sender) Received: from [205.188.109.204] (HELO omr-d07.mx.aol.com) (205.188.109.204) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 01:26:29 +0000 Received: from mtaomg-mca01.mx.aol.com (mtaomg-mca01.mx.aol.com [172.26.221.79]) by omr-d07.mx.aol.com (Outbound Mail Relay) with ESMTP id ECB37701C5850 for ; Wed, 12 Feb 2014 20:26:07 -0500 (EST) Received: from core-dtc001b.r1000.mail.aol.com (core-dtc001.r1000.mail.aol.com [172.29.163.1]) by mtaomg-mca01.mx.aol.com (OMAG/Core Interface) with ESMTP id 3CD7338000084 for ; Wed, 12 Feb 2014 20:26:07 -0500 (EST) To: user@hadoop.apache.org Subject: OPENFORWRITE Files issue X-MB-Message-Source: WebUI X-MB-Message-Type: User MIME-Version: 1.0 From: Xiao Li Content-Type: multipart/alternative; boundary="--------MB_8D0F671B3DD8650_C6C_161DB_webmail-m223.sysops.aol.com" X-Mailer: AOL Webmail 38380-STANDARD Received: from 216.38.134.120 by webmail-m223.sysops.aol.com (64.12.107.167) with HTTP (WebMailUI); Wed, 12 Feb 2014 20:26:06 -0500 Message-Id: <8D0F671B3C5B890-C6C-5AAF@webmail-m223.sysops.aol.com> X-Originating-IP: [216.38.134.120] Date: Wed, 12 Feb 2014 20:26:07 -0500 (EST) x-aol-global-disposition: G DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mx.aol.com; s=20121107; t=1392254767; bh=uqt9OBAHwKIX+OsM/1YxY6f+xahZQaHTgD5vJGLv9g8=; h=From:To:Subject:Message-Id:Date:MIME-Version:Content-Type; b=ixexS5UVVkgVdWsrLaheBWebPAkU8uIzT8WWi3h0lYpaW2aWWjJCqkFfX7OP+Kevq t1rBU0poVD6iyvhjC9wwGAdiLuvR+FhwXUtwFNtLpzDALJx5rLRLreq0qbeS/TdC7K LQmYMjAqa6PWx97kbAja5GlYgMaW9LwBdqBUFZys= x-aol-sid: 3039ac1add4f52fc1f2f1060 X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. ----------MB_8D0F671B3DD8650_C6C_161DB_webmail-m223.sysops.aol.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="us-ascii" Say I have a text file on hdfs in "OPENFORWRITE, HEALTHY" status. some proc= ess is appending to it.=20 It has 4 lines in it. hadoop fs -cat /file | wc -l=20 4 However when I do a wordcount on this file, only first line is visible to t= he mapreduce. Similar in hive when i do "select count(*) from filetable" = =3D 1 If I do "hadoop cp /file /file2", then it works as expected.(file2 is close= d, file is still open) wordcount would see 5 lines in the input directory(1 from opened file, 4 fr= om copied file), hive will return 5. I am wondering if there is anything related to TextInputFormat? I am using CDH 4.4.0 Thanks. Xiao Li ----------MB_8D0F671B3DD8650_C6C_161DB_webmail-m223.sysops.aol.com Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii"

Say I have a text file on hdfs in "OPENFORWRITE, HEALTHY" status. some= process is appending to it. 

It has 4 lines in it.

hadoop fs -cat /file | wc -l 
4

However when I do a wordcount on this file, only first line is visible= to the mapreduce. Similar in hive when i do "select count(*) from filetabl= e" =3D 1

If I do "hadoop cp /file /file2", then it works as expected.(file2 is = closed, file is still open)

wordcount would see 5 lines in the input directory(1 from opened file,= 4 from copied file), hive will return 5.

I am wondering if there is anything related to TextInputFormat?<= /span>

I am using CDH 4.4.0

Thanks.

Xiao Li

----------MB_8D0F671B3DD8650_C6C_161DB_webmail-m223.sysops.aol.com--