Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of pankaj@brightroll.com
 designates 209.85.160.48 as permitted sender)
From: Pankaj Gupta <pankaj@brightroll.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Subject: Reading part of file using Map Reduce
Message-Id: <6E485975-3E63-4CD6-B2B4-36C7E5917335@brightroll.com>
Date: Wed, 31 Oct 2012 16:20:05 -0700
To: user@hadoop.apache.org
Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\))

Hi,

Is it possible to run a MapReduce job on a part of file on HDFS? The use =
case is using a single file on HDFS as a stream to store all log events =
of a particular kind. New data can grow on top while Map Reduce can =
process old data. Of course one option would be to copy part of data =
into a separate file and give that to MapReduce but I was wondering if =
that extra copy can be avoided.

Thanks,
Pankaj=