Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 016F3C1AE for ; Sat, 12 May 2012 19:12:41 +0000 (UTC) Received: (qmail 59643 invoked by uid 500); 12 May 2012 19:12:39 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 59607 invoked by uid 500); 12 May 2012 19:12:39 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 59594 invoked by uid 99); 12 May 2012 19:12:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 May 2012 19:12:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-ob0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 May 2012 19:12:33 +0000 Received: by obbuo13 with SMTP id uo13so10676734obb.1 for ; Sat, 12 May 2012 12:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=2WMlpnTx+ScKDKqZsZ9Uiztve7tJgIbRuUTMTzM7xbM=; b=DIAr1OUeflzhZzBWK02Ge5l4wqD2PoOGYrPB+hYXWc8OiNVUMKNYDGVxRE2iVy3/yg IUs4VnLVjcdowECaO2SuCC/beyFOF4CRv7mwsa7hwBTCWDDfX2OKXmaYopsNMR1x4BQ8 MrfcyFRLHoLtIv9Dt3cK1DdKXxRX95AfMQ1E9tmakm08FE1j6hG0fkR3+l4Y1eddiimR bHfmwD20wxh+TO98uwBDlxQy3YFi4DmQmGwCTFdCuKyJHKWaxbCQj7D5+lKCKs4e0LQM xYW4p7Ngdng+2eQ8moFU1hE7Tl/tKMDTdPSB9+DP39KHJhs1gvxmDplD146U5pNnSnF1 1ZSA== Received: by 10.60.23.138 with SMTP id m10mr3902628oef.12.1336849932082; Sat, 12 May 2012 12:12:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.40.101 with HTTP; Sat, 12 May 2012 12:11:42 -0700 (PDT) In-Reply-To: <4FAEA9AE.3060909@occamsmachete.com> References: <4FAB0D7F.5060205@farfetchers.com> <4FABBEB4.6030106@windwardsolutions.com> <4FABEAB8.9050108@occamsmachete.com> <4FAD290C.2080504@windwardsolutions.com> <4FAE8797.4020900@farfetchers.com> <4FAEA9AE.3060909@occamsmachete.com> From: Ted Dunning Date: Sat, 12 May 2012 12:11:42 -0700 Message-ID: Subject: Re: Canopy estimator To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=e89a8fb1ff2a51ddfd04bfdba1ed --e89a8fb1ff2a51ddfd04bfdba1ed Content-Type: text/plain; charset=UTF-8 Yes. It may help with variable scale. The class technique for dealing with that is to cluster with a small number of clusters at a gross level and then cluster each set of documents that belong to a single large cluster. This automatically adapts to different scales. The new stuff would greatly facilitate your experimentation. On Sat, May 12, 2012 at 11:19 AM, Pat Ferrel wrote: > If you are asking about using your post 0.7 clustering, no I haven't yet. > Will it help with varying scale? I assume by scale you mean the density of > docs in certain areas of the vector space? One thing I am trying now is > limiting the subject matter crawled and getting a much larger sample, which > should get me a denser distribution. --e89a8fb1ff2a51ddfd04bfdba1ed--