hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@gmail.com>
Subject Re: Copying files to hadoop.
Date Wed, 17 Dec 2014 23:25:48 GMT
Hi,

but I want to point out: my solution works, but is not very smart. I
think Rich has the better answer.

If your file is REALLY huge, then his answer Nr. 1 is the way to go. If
you want to dynamically add and remove files, test this and test that,
than his second answer is a good fit. You just "mount" a shared folder
in linux, and than it's "local" in linux/unix.

My answer was more of an academic possibility ;).

Good luck

Wilm

Am 18.12.2014 um 00:16 schrieb Anil Jagtap:
> Oh Thanks a lot Wilm.. You understood my problem accurately.. I
> executed it and it worked.
>
> I understand i can always copy it to Linux and then put it to hadoop,
> but i was just trying to find out if this is possible. 
>
> Thanks again.
>
> Rgds, Anil
>
> On Thu, Dec 18, 2014 at 9:56 AM, Wilm Schumacher
> <wilm.schumacher@gmail.com <mailto:wilm.schumacher@gmail.com>> wrote:
>
>     Am 17.12.2014 um 23:29 schrieb Anil Jagtap:
>     > Dear All,
>     >
>     > I'm pretty new to Hadoop technology and Linux environment hence
>     > struggling even to find solutions for the basic stuff.
>     >
>     > For now, Hortonworks Sandbox is working fine for me and i managed to
>     > connect to it thru SSH.
>     >
>     > Now i have some csv files in my mac os folders which i want to copy
>     > onto Hadoop. As per my knowledge i can copy those files first to
>     Linux
>     > and then put to Hadoop. But is there a way in which just in one
>     > command it will copy to Hadoop directly from mac os folder?
>     yes, there is.
>
>     cat /path/to/your/local/file.csv | ssh hadoopuser@namenode
>     "/remote/server/path/to/hadoop fs -put - /hadoop/folder/name/file.csv"
>
>     As you wrote, that you are also new to linux/unix, this above means:
>
>     * cat => concanate the files (only one file given) and print to
>     standard
>     output
>
>     * pipe | => means, write the standard output from the left hand to the
>     standard input of the right hand side
>
>     * ssh reads from standard input and writes its to the standard
>     input on
>     the remote server command, which is hadoop fs put command, which
>     is told
>     to read from stdin
>
>     Thus you are actually piping the content of the file through 3
>     services.
>     And that's actually a little bit of a hack and in my opinion there
>     is no
>     reason to do this if your file is reasonable small to fit on the
>     remote
>     server. It's like asking "is it possible to reach my destination only
>     using left turns". Well ... it's possible, but not always a good
>     idea ;).
>
>     Best
>
>     Wilm
>


Mime
View raw message