Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 17622 invoked from network); 28 Jan 2010 19:11:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Jan 2010 19:11:57 -0000 Received: (qmail 5108 invoked by uid 500); 28 Jan 2010 19:11:56 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 5066 invoked by uid 500); 28 Jan 2010 19:11:56 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 5056 invoked by uid 99); 28 Jan 2010 19:11:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2010 19:11:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of robin.anil@gmail.com designates 209.85.222.175 as permitted sender) Received: from [209.85.222.175] (HELO mail-pz0-f175.google.com) (209.85.222.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jan 2010 19:11:45 +0000 Received: by pzk5 with SMTP id 5so967131pzk.29 for ; Thu, 28 Jan 2010 11:11:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=wTAHmCmNI1FR7XMGzZh+ziWGdWa4fk01hwOrY9a4Ues=; b=ksEomq0RltqzXQkCQApYEZBnMkxYmv1sFcVZNkeYxFADH+Uejb9yL32yR9pa4n4AYs qHXttzeAQ19gsjqCoQtUIfgIaw/pHQmxZx3+RK42xbCtBvNITA8o5OuX4x/39p2imyRc uo1mhR4sOlCMRx7B+unf/NwapjuUu6AG/Ix6w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Xwh7m1BM+si5h7vbMDdeOrnb8YbmAfiN4gUTYdGJQnLEEUwTcuJtGIkjCo6yvd9xxw sZgxKlIFkSORdS1QIHME4F/YRKdRUPc92xXoA10pJGriL7kM6NLj7rvHZOMD67kROUJr Dxz01wKsSu9R8hHZ3xfxif7REvOjFT/0MTPLw= MIME-Version: 1.0 Received: by 10.140.180.20 with SMTP id c20mr7883319rvf.133.1264705884377; Thu, 28 Jan 2010 11:11:24 -0800 (PST) In-Reply-To: <18f30471001281100w71a8a3d6l1a695cb6adba680d@mail.gmail.com> References: <18f30471001281100w71a8a3d6l1a695cb6adba680d@mail.gmail.com> Date: Fri, 29 Jan 2010 00:41:24 +0530 Message-ID: <7d7600c51001281111o65cd0d91j9463871dadbe6114@mail.gmail.com> Subject: Re: Multiple data-local passes? From: Robin Anil To: mahout-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd1a734fbd182047e3e485f X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd1a734fbd182047e3e485f Content-Type: text/plain; charset=UTF-8 Glad that you asked because I have been asking the same question myself when creating a Text->Vector convertor where i need to iterate over the same data converting them to vectors using a chunk of dictionary at a time. If i had the option of running multiple passes. It would have taken me just a single mapreduce. Here i have to do 1 pass over the data for every chunk of dictionary in memory. True, I can run n sequential job using a HDFS client on different servers. The network data transfer wasn't worth it. Robin On Fri, Jan 29, 2010 at 12:30 AM, Markus Weimer wrote: > Hi, > > I have a question about hadoop, which most likely someone in mahout > must have solved before: > > Many online ML algorithms require multiple passes over data for best > performance. When putting these algorithms on hadoop, one would want > to run the code close to the data (same machine/rack). Mappers offer > this data-local execution but do not offer means to run multiple times > over the data. Of course, one could run the code outside of the hadoop > mapreduce framework as a HDFS client, but that does not offer the > data-locality advantage, in addition to not being scheduled through > the hadoop schedulers. > > How is this solved in mahout? > > Thanks for any pointer, > > Markus > -- ------ Robin Anil Blog: http://techdigger.wordpress.com ------- Mahout in Action - Mammoth Scale machine learning Read Chapter 1 - Its Frrreeee http://www.manning.com/owen Try out Swipeball for iPhone http://itunes.com/apps/swipeball --000e0cd1a734fbd182047e3e485f--