Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D2C2DBC3 for ; Mon, 3 Sep 2012 19:47:56 +0000 (UTC) Received: (qmail 69155 invoked by uid 500); 3 Sep 2012 19:47:51 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 69049 invoked by uid 500); 3 Sep 2012 19:47:51 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69041 invoked by uid 99); 3 Sep 2012 19:47:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2012 19:47:51 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Sep 2012 19:47:43 +0000 Received: by iagt4 with SMTP id t4so9475101iag.35 for ; Mon, 03 Sep 2012 12:47:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=uXEfkb9iTMMcFkDH4XtvM9bByLBi8oqldjaTNpEiyHo=; b=XB/E6Zu9qR6oXKH0607zhZPk8FJmYnl6GK1jOX981Y0GyunQgxdFJytsy7ioO+Kg0Z N4aCfU9MmCmYqHy8IKz/ShvHhvgxTJKZy3uUKnHSA/ka5SIMjiWxtH4cbBoV9MILueLs 30sEg18KsWU5peWhEJwx4YMeEQ7nd8t9KaZQqV7gUTvJ7E8nb/ku0fUoWc99ixNlqjhr huhzE395GGWuxarKmt8MQ2Tqphv1J6Btb57VnoCtrPctJRSshMQsYWSBjUAdHLLtRAfs 6fQqocHJpkuE/x65pCJrwhUoAlhA5YHWezvOSYUN8LzHN4nC/I+iRTTKmgStOx+mnhw1 3nFw== Received: by 10.50.10.168 with SMTP id j8mr4351702igb.14.1346701642007; Mon, 03 Sep 2012 12:47:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.98.68 with HTTP; Mon, 3 Sep 2012 12:47:01 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Mon, 3 Sep 2012 15:47:01 -0400 Message-ID: Subject: Re: best way to join? To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93403f7fd84e804c8d168a1 X-Gm-Message-State: ALoCoQkpMSd19bB7qsfOU5Oihltn00m+sIwcs6vcKV5Tj0xEFHO1etJMJPMZOKNDxM3JMbKvSYXi --14dae93403f7fd84e804c8d168a1 Content-Type: text/plain; charset=ISO-8859-1 On Sun, Sep 2, 2012 at 12:26 PM, dexter morgan wrote: > ... Either way, any clustering process requires calculating the distance > of all points (not between all the points, but of all of them to some > relative point). Because i'll need a clustering MR job, ill probably use > it, despite as you said, it has high probability to be correct (not 100%)... > This is probably right as stated, but I think that there is confusion here. Many people assume that each point in the training data has to have distance computed to all centroids in the clustering. Even this is not true. It is true that you have to compute distance to at least one something, but not necessarily to all of the clusters. --14dae93403f7fd84e804c8d168a1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Sun, Sep 2, 2012 at 12:26 PM, dexter = morgan <dextermorgan4u@gmail.com> wrote:
... Either way, any clustering process requires calcu= lating the distance of all points (not between all the points, but of all o= f them to some relative point). Because i'll need a clustering MR job, = ill probably use it, despite as you said, it has high probability to be cor= rect (not 100%)...

This is probably right as stated, bu= t I think that there is confusion here.

Many peopl= e assume that each point in the training data has to have distance computed= to all centroids in the clustering. =A0Even this is not true. =A0

It is true that you have to compute distance to at leas= t one something, but not necessarily to all of the clusters.
--14dae93403f7fd84e804c8d168a1--