hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Gao <steve....@yahoo.com>
Subject Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?
Date Sat, 19 Jul 2008 00:55:11 GMT
One more little question, why Hadoop streaming is designed in this way to use 2 different options
to do the same thing (i.e. control the reduce number)? What's the point here?

--- On Fri, 7/18/08, Arun C Murthy <acm@yahoo-inc.com> wrote:
From: Arun C Murthy <acm@yahoo-inc.com>
Subject: Re: [Streaming]What is the difference between streaming options: -file and -CacheFile
To: core-user@hadoop.apache.org, "Steve Gao" <steve.gao@yahoo.com>
Date: Friday, July 18, 2008, 8:27 PM

On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:

> Hi All,
>     I am using Hadoop Streaming. I am confused by streaming  
> options: -file and -CacheFile. Seems that they mean the same thing,  
> right?

The difference is that -file will 'ship' your file (local file) to  
the cluster, while -cachefile assumes that it is already present on  
HDFS at the given path.

>     Another misleading options are : -NumReduceTasks and -jobconf  
> mapred.reduce.tasks. Both are used to control (or give hit to) the  
> number of reducers.

Yes, they are both equivalent.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message