hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cao Yi <iridium...@gmail.com>
Subject Re: How to run a mapreduce program not on the node of hadoop cluster?
Date Wed, 21 Jan 2015 02:09:31 GMT
Thanks, Ahmed!

As my hadoop cluster is built on virtual machines, I cloned a node
(namenode and datanode are both ok), changed another hostname and  static
IP, it works!

Best Regards,
Iridium

On Tue, Jan 20, 2015 at 4:46 PM, Ahmed Ossama <ahmed@aossama.com> wrote:

>  The naming differs, Dell call it edge node, Cloudera call it gateway.
>
> In the end, it's just a machine which has hadoop libraries and ecosystem
> deployed to it and working as a client.
>
> Building this node is similar to the rest of the nodes except that it
> doesn't run services, you can deploy pig, oozie, hue, hdfs and yarn and
> submit jobs to your cluster from this node.
>
> I guess you came across this link, but it's worth mentioning
> http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html
>
>
> On 01/20/2015 10:15 AM, Cao Yi wrote:
>
>  thank you, Ahmed! I have another question: how to build an edge node and
> how to use it? can you refer some docs?
>
>  PS, I searched and found so many pages, some called it "client node",
> but no page tells the details of building an edge node, and how to use it.
>
>  Best Regards,
> Iridium
>
> On Wed, Jan 14, 2015 at 11:05 PM, Ahmed Ossama <ahmed@aossama.com> wrote:
>
>>  The node that the project will be deployed on should have the same
>> configuration as the cluster, and hadoop executables as well. It doesn't
>> have to be one of the cluster nodes.
>>
>> This node is typically called a gateway or edge node, where it have all
>> the client programs (hadoop execs, pig, etc...) and you use this node to
>> submit jobs to the cluster.
>>
>> The executable use the configuration to know where to submit jobs and
>> where is your hdfs nn located and so on.
>>
>> On 01/14/2015 04:39 PM, Cao Yi wrote:
>>
>> The program will be used in product environment.
>> Does you mean that the program must be deployed on any node of the
>> cluster?
>>
>>
>>  I have some experience in operating database, I can
>> query/edit/add/remove data on the OS witch the database installed on, or
>> operate from the other machine remotely. Can I use Hadoop remotely as to
>> use database in a similar way?
>>
>>  Best Regards,
>> Iridium
>>
>> On Wed, Jan 14, 2015 at 9:15 PM, unmesha sreeveni <unmeshabiju@gmail.com>
>> wrote:
>>
>>> Your data wont get splitted. so your program runs as single mapper and
>>> single reducer. And your intermediate data is not shuffeld and sorted, But
>>> u can use this for debuging
>>>  On Jan 14, 2015 2:04 PM, "Cao Yi" <iridiumcao@gmail.com> wrote:
>>>
>>>>  Hi,
>>>>
>>>> I write some mapreduce code in my project *my_prj*. *my_prj *will be
>>>> deployed on the machine which is not a node of the cluster.
>>>> how does *my_prj* to run a mapreduce job in this case?
>>>>
>>>>  thank you!
>>>>
>>>>  Best Regards,
>>>> Iridium
>>>>
>>>
>>
>> --
>> Regards,
>> Ahmed Ossama
>>
>>
>
> --
> Regards,
> Ahmed Ossama
>
>

Mime
View raw message