hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: Reduce Performance
Date Tue, 21 Aug 2007 07:30:02 GMT
See below...

Eric Baldeschwieler wrote:
> Actually...
>
> I think it is greatly in the projects interest to have a really 
> elegant one node solution.   It should certainly support 
> multithreading, the web UI, etc.
AFAIK, local setup has never been the interest of hadoop, however, a 
good implementation will definitely be appreciated.
>
> If it is trivial to write and use single node jobs, then we can write 
> an application once in map-reduce and use it either on large clusters 
> or on small devices.
A local threaded implementation will be quite useful in testing code for 
small inputs. You can look at MiniMRCluster in src/test as an example 
written for unit tests.
>
> This would be useful.
>
> Supporting Arun's point, this is an open source projects.  If you find 
> a way of making it work better, give it back and we will incorporate it.
Just open an issue at jira, and attach a patch against trunk. See 
http://wiki.apache.org/lucene-hadoop/HowToContribute
>
> On Aug 19, 2007, at 9:17 PM, Arun C Murthy wrote:
>
>> On Sun, Aug 19, 2007 at 11:33:35PM +0200, Thorsten Schuett wrote:
>> >I have been looking into the LocalJobRunner today. Is there a chance 
>> for
>> >official support for parallel map execution/>1 reduce tasks or 
>> should I look
>> >into adding it to my local copy of the code?
>> >
>>
>> Please file a request (jira), and patch if you are so inclined! There 
>> is nothing *official* about anything here... make yourself at home.
>>
>> Usually there isn't much bang per buck trying to optimize single-node 
>> performance of hadoop's map-reduce, but any contribution is always 
>> welcome.
>>
>> Arun
>>
>> >Thorsten
>> >
>> >On 8/19/07, Thorsten Schuett <schuett@gmail.com> wrote:
>> >>
>> >> In my case, it looks as if the loopback device is the bottleneck. So
>> >> increasing the number of tasks won't help.
>> >>
>> >> Thorsten
>> >>
>> >> On 8/18/07, Ted Dunning <tdunning@veoh.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > You might try increasing the number of map and reduce tasks so 
>> that you
>> >> > can
>> >> > overlap cpu and I/O.  It is common in parallel applications that 
>> you
>> >> > need to
>> >> > do something like this.
>> >> >
>> >> >
>> >> > On 8/18/07 8:36 AM, "Thorsten Schuett" <schuett@gmail.com > wrote:
>> >> > >> If my assumptions are correct, would it be possible to
>> >> > >>> read/access the files directly in the "one-node mode"?
>> >> > >>
>> >> > >> Please take a look at LocalJobRunner in 
>> src/org/apache/hadoop/mapred
>> >> > ...
>> >> > >> set the jobtracker in your config to 'local' and this happens
>> >> > automatically.
>> >> > >> 
>> (http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms )
>> >> > >
>> >> > >
>> >> > > When I use "local", I loose the web interface and the 
>> multi-threading.
>> >> > I can
>> >> > > live with the former, but the latter is not an option.
>> >> > >
>> >> > > Thorsten
>> >> >
>> >> >
>> >>
>>
>
>

Mime
View raw message