Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70C564148 for ; Thu, 7 Jul 2011 15:49:22 +0000 (UTC) Received: (qmail 93548 invoked by uid 500); 7 Jul 2011 15:49:20 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 93417 invoked by uid 500); 7 Jul 2011 15:49:20 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 93401 invoked by uid 99); 7 Jul 2011 15:49:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 15:49:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.42 as permitted sender) Received: from [209.85.216.42] (HELO mail-qw0-f42.google.com) (209.85.216.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jul 2011 15:49:12 +0000 Received: by qwi4 with SMTP id 4so1252893qwi.1 for ; Thu, 07 Jul 2011 08:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=B+R4hCNMWKaLKnMhAZAZgOKM/K7MvED/CSloxUt1+Jo=; b=Af4mEGkwQRTVwVjSJnF4ggo3GmfHK8q6z8oqk3cZwjFu8AK94fUyU3/R1ls0oI4pma 9NG4Bo/Bv9Bqohr6SLbk2v/4u8VHz5jnLKWNg9h+uEgFHk575jnjzLm/POl3XxG0v/zA 2lSDyhG9wuc7907NjDyISwXSbeBC6U53quuug= Received: by 10.224.117.11 with SMTP id o11mr741854qaq.358.1310053731118; Thu, 07 Jul 2011 08:48:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.20.76 with HTTP; Thu, 7 Jul 2011 08:48:31 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Thu, 7 Jul 2011 08:48:31 -0700 Message-ID: Subject: Re: What's the difference between classic decision tree and Mahout Decision forest algorithm? To: dev@mahout.apache.org Cc: user@mahout.apache.org Content-Type: multipart/alternative; boundary=20cf3068449147b00104a77ca721 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3068449147b00104a77ca721 Content-Type: text/plain; charset=UTF-8 The summary of the reason is that this was a summer project and parallelizing the random forest algorithm at all was a big enough project. Writing a single pass on-line algorithm was considered a bit much for the project size. Figuring out how to make multiple passes through an input split was similarly out of scope. If you have a good alternative, this would be of substantial interest because it could improve the currently limited scalability of the decision forest code. On Thu, Jul 7, 2011 at 8:20 AM, Xiaobo Gu wrote: > Why can't a tree be built against a dataset resides on the disk as > long as we can read it ? > --20cf3068449147b00104a77ca721--