Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6EF289F3 for ; Wed, 7 Sep 2011 17:23:08 +0000 (UTC) Received: (qmail 61300 invoked by uid 500); 7 Sep 2011 17:23:07 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 61270 invoked by uid 500); 7 Sep 2011 17:23:07 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 61262 invoked by uid 99); 7 Sep 2011 17:23:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2011 17:23:06 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of clu@atypon.com designates 74.125.245.78 as permitted sender) Received: from [74.125.245.78] (HELO na3sys010aog105.obsmtp.com) (74.125.245.78) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2011 17:22:59 +0000 Received: from mail-yi0-f41.google.com ([209.85.218.41]) (using TLSv1) by na3sys010aob105.postini.com ([74.125.244.12]) with SMTP ID DSNKTmeoXcDrWRCxxba7Rd6sHcsER3RzxmeP@postini.com; Wed, 07 Sep 2011 10:22:39 PDT Received: by mail-yi0-f41.google.com with SMTP id 24so509623yic.0 for ; Wed, 07 Sep 2011 10:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=atypon.com; s=google; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=wbN4yvf3BDQ9MyiMwWMSTVERpmcWt/9YPZ1+J9Wfxbc=; b=ZhzXSXtw7QQPgD5Wzdh48DofcFx4VNQpNc7k6LZ7hW87P3srjvi9LQtj8l4UDEt4GA Oo7TiocrKJu0RgQ6BUv+mH4WlcWTclXjXi8PEWjDHHb7lwcrJ3hwdoAtoaVOIuUBct5X esPiNIXaCwf13F5L/mRoN2wbvw2fV9OHg3KRo= Received: by 10.68.36.230 with SMTP id t6mr1982587pbj.214.1315416156702; Wed, 07 Sep 2011 10:22:36 -0700 (PDT) Received: from [10.1.1.38] ([63.80.172.146]) by mx.google.com with ESMTPS id u10sm6805492pbr.12.2011.09.07.10.22.35 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 07 Sep 2011 10:22:36 -0700 (PDT) Message-ID: <4E67A98B.9020101@atypon.com> Date: Wed, 07 Sep 2011 10:27:39 -0700 From: Chris Lu User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Thunderbird/3.1.13 MIME-Version: 1.0 To: user@mahout.apache.org Subject: Re: LDA on single node is much faster than 20 nodes References: <4E65CD3F.3040800@atypon.com> <4E666020.4070606@atypon.com> <4E66A54D.1020005@atypon.com> <4E66B075.1020507@atypon.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------030202000403020308040707" --------------030202000403020308040707 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thanks for the suggestions! I finally managed to get Hadoop parallel the mapping processes. I changed not only the "mapred.max.split.size" setting, but also "dfs.block.size", because of how FileInputFormat.java compute the split size. protected long computeSplitSize(long blockSize, long minSize, long maxSize) { return Math.max(minSize, Math.min(maxSize, blockSize)); } Now seems all nodes are running in parallel! Chris On 09/06/2011 04:44 PM, Jake Mannix wrote: > On Tue, Sep 6, 2011 at 4:44 PM, Chris Lu wrote: > >> I see, thanks! >> >> Seems it should build into Mahout LDA algorithms, since the input file is >> usually not too large, but really needs parallel mapping processes. >> >> > If your input is not large, running a multithreaded in-memory algorithm on a > relatively beefy box (16+ cores, enough RAM to fit your data + model + some > spare) will be *much* faster than putting the same data on cluster, > actually. > > -jake > --------------030202000403020308040707--