Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 47231F6E2 for ; Wed, 27 Mar 2013 10:00:13 +0000 (UTC) Received: (qmail 69538 invoked by uid 500); 27 Mar 2013 10:00:08 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 69163 invoked by uid 500); 27 Mar 2013 10:00:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69146 invoked by uid 99); 27 Mar 2013 10:00:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Mar 2013 10:00:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yaron.gonen@gmail.com designates 209.85.214.50 as permitted sender) Received: from [209.85.214.50] (HELO mail-bk0-f50.google.com) (209.85.214.50) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Mar 2013 10:00:02 +0000 Received: by mail-bk0-f50.google.com with SMTP id jg1so622065bkc.37 for ; Wed, 27 Mar 2013 02:59:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=hj4++r0M8/dJb8gdrRf7QuUDlbjWZMWkFaiTDDMraNA=; b=Cz80DM6kQhU0GefcC1fklzcI9Zp/3sKPIS5KQEyzz8mkSR6D+ALlITDGBFpISgkJZQ CcebGkrC2U9EHs19K2arE0DOQ3O+89ftloCjYgGUyeRWkceQ4S+O9v0USVnkyQjeGkBq VeycNbUsoK56MzFtUun2j0mIhUN+hrIVmM6QRDwJHiC5J+cnrpZKIXJxpHLr7NiP2+Zy lKAHy7WT4XcJYkKC5i0uN+jq3PxbBTp95ojh/uocYpnCuUUINRofSE9JlgasokUB3gcw f9F2sz+a2dvqSvi/836B0c/4+D5cODSqeu3ICs2JGrj+UJdJV3G+yNfAX79AAAvuR2Hq LHGg== MIME-Version: 1.0 X-Received: by 10.205.129.16 with SMTP id hg16mr9603190bkc.11.1364378381556; Wed, 27 Mar 2013 02:59:41 -0700 (PDT) Received: by 10.204.13.77 with HTTP; Wed, 27 Mar 2013 02:59:41 -0700 (PDT) Date: Wed, 27 Mar 2013 11:59:41 +0200 Message-ID: Subject: =?ISO-8859-1?Q?Na=EFve_k=2Dmeans_using_hadoop?= From: Yaron Gonen To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0ce0ac32c5d16004d8e51843 X-Virus-Checked: Checked by ClamAV on apache.org --000e0ce0ac32c5d16004d8e51843 Content-Type: text/plain; charset=ISO-8859-1 Hi, I'd like to implement k-means by myself, in the following naive way: Given a large set of vectors: 1. Generate k random centers from set. 2. Mapper reads all center and a split of the vectors set and emits for each vector the closest center as a key. 3. Reducer calculated new center and writes it. 4. Goto step 2 until no change in the centers. My question is very basic: how do I distribute all the new centers (produced by the reducers) to all the mappers? I can't use distributed cache since its read-only. I can't use the context.write since it will create a file for each reduce task, and I need a single file. The more general issue here is how to distribute data produced by reducer to all the mappers? Thanks. --000e0ce0ac32c5d16004d8e51843 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
I'd like to implement k-means b= y myself, in the following naive way:
Given a large set of vector= s:
  1. Generate k random centers from set.
  2. Mapper reads all center and a split of the vectors set and emits = for each vector the closest center as a key.
  3. Reducer calc= ulated new center and writes it.
  4. Goto step 2 until no cha= nge in the centers.
My question is very basic: how do I distribute all the= new centers (produced by the reducers) to all the mappers? I can't use= distributed cache since its read-only. I can't use the context.write s= ince it will create a file for each reduce task, and I need a single file. = The more general issue here is how to distribute data produced by reducer t= o all the mappers?=A0

Thanks.
--000e0ce0ac32c5d16004d8e51843--