Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 45953 invoked from network); 9 Jan 2011 02:58:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Jan 2011 02:58:02 -0000 Received: (qmail 84906 invoked by uid 500); 9 Jan 2011 02:58:00 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 84800 invoked by uid 500); 9 Jan 2011 02:58:00 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 84792 invoked by uid 99); 9 Jan 2011 02:57:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Jan 2011 02:57:59 +0000 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sonalgoyal4@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Jan 2011 02:57:55 +0000 Received: by qwh6 with SMTP id 6so19577267qwh.35 for ; Sat, 08 Jan 2011 18:57:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=Gy8JabMEa9jc/y3e0sk5yBo+3Q3SJz2YkLZGzl1WKyM=; b=CZAvwuUb0Z1fzrTVjT+mQ5LO78B0mmVU0SPHgaTQ3KqGJQV3enO+gcZj9qZw/qSIh0 /6ruM5StBwvg0Cmi68O1WZxVSeWwsFiLaNzynpOUzweGPFq1nBrMm0fZk6coQAqI8gOE 4g6KmsPgMADMyRWBPOcO8XLAp506kzX5p3ylY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=EWEyKl4kV45XiIh2Lmb9ufNX+JHkyngfETM9VOdmLIZCXhEoiBlIxcMKM6HJzaOp9O Mbxmc6Qb1TEBNqVWKwe4sGqzl8Z+lXqrObD7V16Fn/rnI5QyI9/SGX1sIM1cxDn2kcBE kA9TFJQvxwOn1cM7X6gtaVXN0QC8U4tyb6TMo= MIME-Version: 1.0 Received: by 10.229.224.13 with SMTP id im13mr24129631qcb.70.1294541854077; Sat, 08 Jan 2011 18:57:34 -0800 (PST) Received: by 10.229.224.199 with HTTP; Sat, 8 Jan 2011 18:57:34 -0800 (PST) In-Reply-To: References: Date: Sun, 9 Jan 2011 08:27:34 +0530 Message-ID: Subject: Re: Import data from mysql From: Sonal Goyal To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001485ebb03e5bf4570499610391 --001485ebb03e5bf4570499610391 Content-Type: text/plain; charset=ISO-8859-1 Hi Brian, You can check HIHO at https://github.com/sonalgoyal/hiho which can help you load data from any JDBC database to the Hadoop file system. If your table has a date or id field, or any indicator for modified/newly added rows, you can import only the altered rows every day. Please let me know if you need help. Thanks and Regards, Sonal Connect Hadoop with databases, Salesforce, FTP servers and others Nube Technologies On Sun, Jan 9, 2011 at 5:03 AM, Brian McSweeney wrote: > Hi folks, > > I'm a TOTAL newbie on hadoop. I have an existing webapp that has a growing > number of rows in a mysql database that I have to compare against one > another once a day from a batch job. This is an exponential problem as > every > row must be compared against every other row. I was thinking of > parallelizing this computation via hadoop. As such, I was thinking that > perhaps the first thing to look at is how to bring info from a database to > a > hadoop job and vise versa. I have seen the following relevant info > > https://issues.apache.org/jira/browse/HADOOP-2536 > > and also > > http://architects.dzone.com/articles/tools-moving-sql-database > > any advice on what approach to use? > > cheers, > Brian > --001485ebb03e5bf4570499610391--