hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <jcove...@gmail.com>
Subject Re: Can you see the name of the document being loaded?
Date Wed, 10 Aug 2011 00:25:29 GMT
Much obliged, Harsh. looks perfect.

2011/8/9 Harsh J <harsh@cloudera.com>

> Jonathan,
>
> 1. is correct with the compound key method, since you need document-ID
> and then work upon it. If you don't want it grouped/sorted by
> document, consider adding it as a value attribute instead, of course.
>
> 2. The record reader is the right place. The FileSplit object's path
> attribute specifically. I've detailed how to extract information from
> Mappers before (both old and new APIs of MR):
> http://search-hadoop.com/m/9Nqjm1aqu8a1 has the pointers.
>
> On Wed, Aug 10, 2011 at 2:41 AM, Jonathan Coveney <jcoveney@gmail.com>
> wrote:
> > I want to calculate some statistics on a per document basis, and it seems
> > like the only way to do this would be to emit a compound key of
> > (key,documentname).
> > 1) Is this the case, or is there a better way to do this?
> > 2) If this is the only way to calculate a per input file basis, where is
> the
> > right place to grab this? A custom line reader? What object is exposed to
> > this?
>
>
>
> --
> Harsh J
>

Mime
View raw message