Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 69664 invoked from network); 29 Apr 2009 08:37:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Apr 2009 08:37:31 -0000 Received: (qmail 65176 invoked by uid 500); 29 Apr 2009 08:37:29 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 65080 invoked by uid 500); 29 Apr 2009 08:37:28 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 65070 invoked by uid 99); 29 Apr 2009 08:37:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2009 08:37:28 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [85.158.138.147] (HELO mail195.messagelabs.com) (85.158.138.147) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2009 08:37:20 +0000 X-VirusChecked: Checked X-Env-Sender: Adam.Retter@landmark.co.uk X-Msg-Ref: server-13.tower-195.messagelabs.com!1240994217!12281137!2 X-StarScan-Version: 6.0.0; banners=-,-,- X-Originating-IP: [212.104.137.248] Received: (qmail 8909 invoked from network); 29 Apr 2009 08:36:57 -0000 Received: from unknown (HELO lanrly02.landmarkinfo.co.uk) (212.104.137.248) by server-13.tower-195.messagelabs.com with DHE-RSA-AES256-SHA encrypted SMTP; 29 Apr 2009 08:36:57 -0000 Received: from exmx02.corp.edrlandmark.net (exmx02.corp.edrlandmark.net [192.168.2.234]) by lanrly02.landmarkinfo.co.uk (8.14.1/8.12.8) with ESMTP id n3T8BQpb010906 for ; Wed, 29 Apr 2009 09:11:26 +0100 Received: from EXMXCLUSTER.corp.edrlandmark.net ([192.168.2.43]) by exmx02.corp.edrlandmark.net with Microsoft SMTPSVC(6.0.3790.3959); Wed, 29 Apr 2009 09:36:28 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Appropriate for Hadoop? Date: Wed, 29 Apr 2009 09:36:27 +0100 Message-ID: In-Reply-To: <49efc3330904281224r36738921qbf994b12fddb5da7@mail.gmail.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Appropriate for Hadoop? Thread-Index: AcnINxuspsInbDjlTw6+IZYoeQhXjQAblCEg References: <2C52DBBEC4855C438BB330CB0D3B46590131C8FB@SNV-EXVS01.ds.corp.yahoo.com> <49efc3330904281224r36738921qbf994b12fddb5da7@mail.gmail.com> From: "Adam Retter" To: X-OriginalArrivalTime: 29 Apr 2009 08:36:28.0575 (UTC) FILETIME=[97070AF0:01C9C8A5] X-landmark-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: n3T8BQpb010906 X-landmark-MailScanner: Found to be clean X-landmark-MailScanner-From: adam.retter@landmark.co.uk X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No I was more concerned that the input to our input is from SQL databases and a proprietary EMC document store. And that our output is to a different SQL database. I don't want to use any sort of file system at all. =20 Adam Retter Software Developer Landmark Information Group =20 T: 01392 685403 (x5403)=20 =20 5-7 Abbey Court, Eagle Way, Sowton, Exeter, Devon, EX2 7HY =20 www.landmark.co.uk =20 -----Original Message----- From: Chuck Lam [mailto:chuck.lam@gmail.com]=20 Sent: 28 April 2009 20:25 To: core-user@hadoop.apache.org Subject: Re: Appropriate for Hadoop? HDFS is designed with Hadoop in mind, so there are certain advantages (e.g. performance, reliability, and ease of use) to using HDFS for Hadoop. However, it's not required. For example, when you run Hadoop in standalone mode, it just uses the file system on your local machine. When you run it on Amazon AWS, it can use S3 as a file system. On Tue, Apr 28, 2009 at 6:15 AM, Adam Retter wrote: > > > Each document processing is independent and can be processed > > parallelly, so that part could be done in a map reduce job. > > Now whether it suits this use case depends on rate at which new > > URI's are discovered for processing and acceptable delay in processing > > of a document. The way I see it you can batch the URI's > > and input that to mapreduce job. Each mapper can work on sublist of > URIs. > > You can choose to make DB inserts from mapper itself. In that case > > you can set no of reducers to 0. Otherwise if batching of the queries > > is an option then you can consider making batch inserts in reducer. It > > will help in reducing load on DB. > > So I don't have to use HDFS at all when using Hadoop? > > > > Registered Office: 7 Abbey Court, Eagle Way, Sowton, Exeter, Devon, EX2 7HY > Registered Number 2892803 Registered in England and Wales > > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > > The information contained in this e-mail is confidential and may be subject > to > legal privilege. If you are not the intended recipient, you must not use, > copy, > distribute or disclose the e-mail or any part of its contents or take any > action in reliance on it. If you have received this e-mail in error, please > e-mail the sender by replying to this message. All reasonable precautions > have > been taken to ensure no viruses are present in this e-mail. Landmark > Information > Group Limited cannot accept responsibility for loss or damage arising from > the > use of this e-mail or attachments and recommend that you subject these to > your virus checking procedures prior to use. > > Registered Office: 7 Abbey Court, Eagle Way, Sowton, Exeter, Devon, EX2 7HY Registered Number 2892803 Registered in England and Wales=20 This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email=20 The information contained in this e-mail is confidential and may be subject= to=20 legal privilege. If you are not the intended recipient, you must not use, c= opy,=20 distribute or disclose the e-mail or any part of its contents or take any= =20 action in reliance on it. If you have received this e-mail in error, please= =20 e-mail the sender by replying to this message. All reasonable precautions h= ave=20 been taken to ensure no viruses are present in this e-mail. Landmark Inform= ation Group Limited cannot accept responsibility for loss or damage arising from = the=20 use of this e-mail or attachments and recommend that you subject these to= =20 your virus checking procedures prior to use.