hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Can you see the name of the document being loaded?
Date Tue, 09 Aug 2011 21:51:49 GMT

1. is correct with the compound key method, since you need document-ID
and then work upon it. If you don't want it grouped/sorted by
document, consider adding it as a value attribute instead, of course.

2. The record reader is the right place. The FileSplit object's path
attribute specifically. I've detailed how to extract information from
Mappers before (both old and new APIs of MR):
http://search-hadoop.com/m/9Nqjm1aqu8a1 has the pointers.

On Wed, Aug 10, 2011 at 2:41 AM, Jonathan Coveney <jcoveney@gmail.com> wrote:
> I want to calculate some statistics on a per document basis, and it seems
> like the only way to do this would be to emit a compound key of
> (key,documentname).
> 1) Is this the case, or is there a better way to do this?
> 2) If this is the only way to calculate a per input file basis, where is the
> right place to grab this? A custom line reader? What object is exposed to
> this?

Harsh J

View raw message