hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: hd fs -head?
Date Mon, 27 Sep 2010 20:46:39 GMT
On Mon, Sep 27, 2010 at 11:13 AM, Keith Wiley <kwiley@keithwiley.com> wrote:
> On 2010, Sep 27, at 7:02 AM, Edward Capriolo wrote:
>
>> On Mon, Sep 27, 2010 at 3:23 AM, Keith Wiley <kwiley@keithwiley.com>
>> wrote:
>>>
>>> Is there a particularly good reason for why the "hadoop fs" command
>>> supports
>>> -cat and -tail, but not -head?
>>>
>>
>> Tail is needed to be done efficiently but head you can just do
>> yourself. Most people probably use
>>
>> hadoop dfs -cat file | head -5.
>
>
> I disagree with your use of the word "efficiently".  :-)  To my
> understanding (and perhaps that's the source of my error), the approach you
> suggested reads the entire file over the net from the cluster to your client
> machine.  That file could conceivably be of HDFS scales (100s of GBs, even
> TBs wouldn't be uncommon).
>
> What do you think?  Am I wrong in my interpretation of how
> hadoopCat-pipe-head would work?
>
> Cheers!
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
>  music.keithwiley.com
>
> "And what if we picked the wrong religion?  Every week, we're just making
> God
> madder and madder!"
>                                           --  Homer Simpson
> ________________________________________________________________________________
>
>

'hadoop dfs -cat' will output the file as it is read. head -5 will
kill the first half of the pipe after 5 lines. With buffering more
might be physically read then 5 lines but this invocation does not
read the enter HDFS file before piping it to head.

Mime
View raw message