Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 96575 invoked from network); 18 Mar 2009 13:07:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Mar 2009 13:07:04 -0000 Received: (qmail 87870 invoked by uid 500); 18 Mar 2009 13:06:50 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 87809 invoked by uid 500); 18 Mar 2009 13:06:50 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 87761 invoked by uid 99); 18 Mar 2009 13:06:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2009 06:06:49 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of enis.soz@gmail.com designates 209.85.219.171 as permitted sender) Received: from [209.85.219.171] (HELO mail-ew0-f171.google.com) (209.85.219.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Mar 2009 13:06:39 +0000 Received: by ewy19 with SMTP id 19so31882ewy.29 for ; Wed, 18 Mar 2009 06:06:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=9RPO7L65zFiKaQJvIN99Ex5MvETHdQEbFwN5tgn1+uo=; b=tRaGm48TrioF+ipGeA1n0JJoHc5FyTlWB1cDwIdJoQNWCqYNtsp5L7VMdPc9GjFDiT 3/juWz+m/WgkYu5L/SM77vFYxOkPSXO+xzJlWOVBoTAeE89Vnpp9ZpgNDT77bCz1z/Uh bNAcEcm4vvNsMPzrdF+OLtd5CIMBYhHg+6ejU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=g/iUgbzrhUk4cmv6vhqLKzsDuaLXNISsu6f6MtKdeF8FAMXfzsHNwCSg04coXRvPsz u2uYuISoyWfGiMdideUzs7XiaRE36f61TLXo8tNBhSHgSYQkZ8XKNQwO5uzWxd+EWEG9 CB824zG2htmDNh8rpcDD6crm/x/cu1OQ8EvxQ= Received: by 10.216.29.80 with SMTP id h58mr145644wea.159.1237381579520; Wed, 18 Mar 2009 06:06:19 -0700 (PDT) Received: from ?192.168.2.15? ([85.105.135.220]) by mx.google.com with ESMTPS id 28sm111220eyg.15.2009.03.18.06.06.18 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 18 Mar 2009 06:06:19 -0700 (PDT) Message-ID: <49C0F207.4090203@gmail.com> Date: Wed, 18 Mar 2009 15:07:19 +0200 From: Enis Soztutar User-Agent: Thunderbird 2.0.0.19 (X11/20090105) MIME-Version: 1.0 To: core-user@hadoop.apache.org Subject: Re: merging files References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Use MultipleInputs and use two different mappers for the inputs. map1 should be IdentityMapper, mapper 2 should output key, value pairs where value is a peudo marker value(same for all keys), which marks that the value is null/empty. In the reducer just output the key/value pairs which does not include the marker value in their values. in your example suppose that we use -1 as a marker value, then in mapper2, the output will be 4, -1 2, -1 and the reducer will get : 2, {1,3,5,-1} 3, {1,2} 4, {7,9,-1} 6, {3} then reducer will output : 3, 1 3, 2 6, 3 Nir Zohar wrote: > Hi, > > > > I would like your help with the below question. > > I have 2 files: file1 (key, value), file2 (only key) and I need to exclude > all records from file1 that these key records not in file2. > > 1. The output format is key-value, not only keys. > > 2. The key is not primary key; hence it's not possible to have joined in the > end. > > > > Can you assist? > > > > Thanks, > > Nir. > > > > > > Example: > > > > file1: > > 2,1 > > 2,3 > > 2,5 > > 3,1 > > 3,2 > > 4,7 > > 4,9 > > 6,3 > > > > file2: > > 4 > > 2 > > > > Output: > > 3,1 > > 3,2 > > 6,3 > > > > > > > > >