Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 49B0ED85F for ; Tue, 28 Aug 2012 15:33:43 +0000 (UTC) Received: (qmail 56580 invoked by uid 500); 28 Aug 2012 15:33:38 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 56465 invoked by uid 500); 28 Aug 2012 15:33:38 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 56458 invoked by uid 99); 28 Aug 2012 15:33:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 15:33:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 15:33:30 +0000 Received: by iagt4 with SMTP id t4so12520336iag.35 for ; Tue, 28 Aug 2012 08:33:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=ZmOwu1An/d4SOPF15VnvG1SLRvQVrwOs/3Ey5HqS0Mk=; b=OVwBgGSmM7SOdnmv1IIL9+lyeQ8sv8Me/Etz8rNwivuZ/4lSy7EGfis2vh3LBmXaYd ZVDisHkiPvcWsoP1h9czOVT7Y9cKwb41tRerOhnOdYftSk6DU7q/lm6+gaHkEBYWlZ6X WFiFCH/s87rxgqXKVcBFjlnLBIxuWTznBJQ6P0qY/Nc4FM1Y0xps/hnIL9rtC54e7izB 4I7bbxUwycF37gyzEbxjq2k40rJzVWKBExEVgb1jQvuhZTumvDllW/UIR3mhcBvHqTZQ Ih37nuWMZK+SHbdlIbQuSnCvvglKeYqVQaDBfzS2EFW8Hqkw2mlmLuuWz2F9pSj4hCSQ m+pQ== Received: by 10.50.41.169 with SMTP id g9mr5098283igl.4.1346167988865; Tue, 28 Aug 2012 08:33:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.98.68 with HTTP; Tue, 28 Aug 2012 08:32:48 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Tue, 28 Aug 2012 11:32:48 -0400 Message-ID: Subject: Re: best way to join? To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae934059dc8cb9904c85528fd X-Gm-Message-State: ALoCoQm54zhe9dsdmmbFvKoHiLDs8zgVWA+oSqGEC6Tbvzq9+9/yo/YCpJj0NCw831tzVkIwmRHc --14dae934059dc8cb9904c85528fd Content-Type: text/plain; charset=ISO-8859-1 On Tue, Aug 28, 2012 at 9:48 AM, dexter morgan wrote: > > I understand your solution ( i think) , didn't think of that, in that > particular way. > I think that lets say i have 1M data-points, and running knn , that the > k=1M and n=10 (each point is a cluster that requires up to 10 points) > is an overkill. > I am not sure I understand you. n = number of points. k = number of clusters. For searching 1 million points, I would recommend thousands of clusters. > How can i achieve the same result WITHOUT using mahout, just running on > the dataset , i even think it'll be in the same complexity (o(n^2)) > Running with a good knn package will give you roughly O(n log n) complexity. --14dae934059dc8cb9904c85528fd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Tue, Aug 28, 2012 at 9:48 AM, dexter = morgan <dextermorgan4u@gmail.com> wrote:

I understand your solution ( i think) = , didn't think of that, in that particular way.
I think that = lets say i have 1M data-points, and running knn , that the k=3D1M and n=3D1= 0 (each point is a cluster that requires up to 10 points)=A0
is an overkill.

I am not s= ure I understand you. =A0n =3D number of points. =A0k =3D number of cluster= s. =A0For searching 1 million points, I would recommend thousands of cluste= rs.
=A0
How can = i achieve the same result WITHOUT using mahout, just running on the dataset= , i even think it'll be in the same complexity (o(n^2))

Running with a good knn package will= give you roughly O(n log n) complexity.=A0

--14dae934059dc8cb9904c85528fd--