Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C2C3BD1FE for ; Fri, 8 Mar 2013 13:17:45 +0000 (UTC) Received: (qmail 18848 invoked by uid 500); 8 Mar 2013 13:17:40 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 18505 invoked by uid 500); 8 Mar 2013 13:17:37 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18491 invoked by uid 99); 8 Mar 2013 13:17:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 13:17:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.45] (HELO mail-wg0-f45.google.com) (74.125.82.45) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Mar 2013 13:17:31 +0000 Received: by mail-wg0-f45.google.com with SMTP id dq12so2537358wgb.12 for ; Fri, 08 Mar 2013 05:17:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=N35domdiAm1Jv7BeM6n9t9KRHcQQWVNq3Mn6LBf0Bbw=; b=juhJNoh8NEmMF33X9oARjTvXMlbQb77STuzbgZD5wW6fZhKJQEbs3P/QR+fAZRCDut t7iiR6n7zKSJlp0jgXphG3ckyATGVi+ypFUjgqLM+JfPpuYKGBMSKc5AW0crSriBD+iA tXoZGHf5GPR3RquZkaHJMvSssexW+W+X37QIsngFZp8m9UdpNydw+OzXLIcRowjVp7V+ YJH20ES2CG7+aoqTbrN9MJxVVKSPBgHcMDtUdX10so2EnMPBHMlQTI6yigYyZoivO5fY OL3Yd+RHrjE7qAtuqC8XUZfdW1+v/qWRAv1jw8+aAieRm9PF4BcIZI/cMfCj6peMgf9g 2eCA== X-Received: by 10.180.95.66 with SMTP id di2mr40392354wib.18.1362748629191; Fri, 08 Mar 2013 05:17:09 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.58.84 with HTTP; Fri, 8 Mar 2013 05:16:49 -0800 (PST) In-Reply-To: References: <-4407783476376646364@unknownmsgid> From: Jean-Marc Spaggiari Date: Fri, 8 Mar 2013 08:16:49 -0500 Message-ID: Subject: Re: [Hadoop-Help]About Map-Reduce implementation To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQm7LE4u4aaWR/bYdEwGqQvkR5ZNM3x1GL47F1sS2p2+pzr228fYhq8ZZj+9erx29VIhI/iw X-Virus-Checked: Checked by ClamAV on apache.org Hi Mayur, Take a look here: http://hadoop.apache.org/docs/r1.1.1/single_node_setup.html#PseudoDistributed "Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process." = SingleNode. So you can only use the Fully-Distributed mode. JM 2013/3/8 Mayur Patil : > Hello, > > Thank you sir for your favorable reply. > > I am going to use 1master and 2 worker > > nodes ; totally 3 nodes. > > > Thank you !! > > -- > Cheers, > Mayur > > On Fri, Mar 8, 2013 at 8:30 AM, Jean-Marc Spaggiari > wrote: >> >> Hi Mayur, >> >> Those 3 modes are 3 differents ways to use Hadoop, however, the only >> production mode here is the fully distributed one. The 2 others are >> more for local testing. How many nodes are you expecting to use hadoop >> on? >> >> JM >> >> >> 2013/3/7 Mayur Patil : >> > Hello, >> > >> > Now I am slowly understanding Hadoop working. >> > >> > As I want to collect the logs from three machines >> > >> > including Master itself . My small query is >> > >> > which mode should I implement for this?? >> > >> > Standalone Operation >> > Pseudo-Distributed Operation >> > Fully-Distributed Operation >> > >> > Seeking for guidance, >> > >> > Thank you !! >> > -- >> > Cheers, >> > Mayur >> > >> > >> > >> > >> >>> Hi mayur, >> >>> >> >>> Flume is used for data collection. Pig is used for data processing. >> >>> For eg, if you have a bunch of servers that you want to collect the >> >>> logs from and push to HDFS - you would use flume. Now if you need to >> >>> run some analysis on that data, you could use pig to do that. >> >>> >> >>> Sent from my iPhone >> >>> >> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil >> >>> wrote: >> >>> >> >>> > Hello, >> >>> > >> >>> > I just read about Pig >> >>> > >> >>> >> Pig >> >>> >> A data flow language and execution environment for exploring very >> >>> > large datasets. >> >>> >> Pig runs on HDFS and MapReduce clusters. >> >>> > >> >>> > What the actual difference between Pig and Flume makes in logs >> >>> > clustering?? >> >>> > >> >>> > Thank you !! >> >>> > -- >> >>> > Cheers, >> >>> > Mayur. >> >>> > >> >>> > >> >>> > >> >>> >> Hey Mayur, >> >>> >>> >> >>> >>> If you are collecting logs from multiple servers then you can use >> >>> >>> flume >> >>> >>> for the same. >> >>> >>> >> >>> >>> if the contents of the logs are different in format then you can >> >>> >>> just >> >>> >>> use >> >>> >>> textfileinput format to read and write into any other format you >> >>> >>> want >> >>> >>> for >> >>> >>> your processing in later part of your projects >> >>> >>> >> >>> >>> first thing you need to learn is how to setup hadoop >> >>> >>> then you can try writing sample hadoop mapreduce jobs to read from >> >>> >>> text >> >>> >>> file and then process them and write the results into another file >> >>> >>> then you can integrate flume as your log collection mechanism >> >>> >>> once you get hold on the system then you can decide more on which >> >>> >>> paths >> >>> >>> you want to follow based on your requirements for storage, compute >> >>> >>> time, >> >>> >>> compute capacity, compression etc >> >>> >>> >> >>> >> -------------- >> >>> >> -------------- >> >>> >> >> >>> >>> Hi, >> >>> >>> >> >>> >>> Please read basics on how hadoop works. >> >>> >>> >> >>> >>> Then start your hands on with map reduce coding. >> >>> >>> >> >>> >>> The tool which has been made for you is flume , but don't see tool >> >>> >>> till >> >>> >>> you complete above two steps. >> >>> >>> >> >>> >>> Good luck , keep us posted. >> >>> >>> >> >>> >>> Regards, >> >>> >>> >> >>> >>> Jagat Singh >> >>> >>> >> >>> >>> ----------- >> >>> >>> Sent from Mobile , short and crisp. >> >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" >> >>> >>> wrote: >> >>> >>> >> >>> >>>> Hello, >> >>> >>>> >> >>> >>>> I am new to Hadoop. I am doing a project in cloud in which I >> >>> >>>> >> >>> >>>> have to use hadoop for Map-reduce. It is such that I am going >> >>> >>>> >> >>> >>>> to collect logs from 2-3 machines having different locations. >> >>> >>>> >> >>> >>>> The logs are also in different formats such as .rtf .log .txt >> >>> >>>> >> >>> >>>> Later, I have to collect and convert them to one format and >> >>> >>>> >> >>> >>>> collect to one location. >> >>> >>>> >> >>> >>>> So I am asking which module of Hadoop that I need to study >> >>> >>>> >> >>> >>>> for this implementation?? Or whole framework should I need >> >>> >>>> >> >>> >>>> to study ?? >> >>> >>>> >> >>> >>>> Seeking for guidance, >> >>> >>>> >> >>> >>>> Thank you !! >> > >> > >> > >> > >> > -- >> > Cheers, >> > Mayur. > > > > > -- > Cheers, > Mayur.