hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Enhancement to TextInputFormat?
Date Thu, 06 Jul 2006 17:49:33 GMT
I think it is interesting.  I think you'd want a way to specify that  
the target file is itself a list of additional URIs as well.  That  
would support scenarios such as a .jsp on a master server that simply  
listed its slaves and then the slaves could list their local content.

Might also want to support the setting of config vars per file, so  
you can label inputs in the mapper.  (Odd thought)

On Jul 6, 2006, at 2:18 AM, Arun C Murthy wrote:

> Hi,
>   Here's a scenario I have faced a couple of times recently:
>   <scenario>
>     I have a list of URIs (either http:// or just dfs file-list)  
> which represent input to a Map-Reduce task where each map gets 1  
> URI, gets data from the URI (read either through dfs apis or over  
> http as the case maybe) and then manipulates that data.
>   </scenario>
>   In-essence it's a simple TextInputFormat with each 'line'  
> representing not the actual 'data' to manipulate in the map, but an  
> 'indirection' to the data.
>   Do you guys think it makes sense to provide this as a part of the  
> MR framework itself? i.e. extend TextInputFormat into (say)  
> URIInputFormat and the MR framework then 'fetches' the data (the  
> 'fetcher'/'reader' is configurable with reasonable defaults  
> provided in the framework e.g. for dfs://, http:// etc.)  pointed  
> to by the URI and then provides a 'stream' (as 'key') to the map  
> function?
>   Admittedly it isn't very hard to do as-is today, however it would  
> definitely ease the user's job. All he needs is to provide a simple  
> text file with a list of URIs and then gets a readable stream in  
> his map. Thus reducing the amount of 'code' he has to write and  
> enhancing his experience.
>   Thoughts?
>   If there is sufficient interest/utility I will go ahead and spec  
> this in more detail and create a jira issue.
> thanks,
> Arun

View raw message