Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 87768 invoked from network); 20 Mar 2008 16:43:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Mar 2008 16:43:24 -0000 Received: (qmail 66476 invoked by uid 500); 20 Mar 2008 16:43:19 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 66278 invoked by uid 500); 20 Mar 2008 16:43:18 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 66269 invoked by uid 99); 20 Mar 2008 16:43:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2008 09:43:18 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [76.96.30.56] (HELO QMTA06.emeryville.ca.mail.comcast.net) (76.96.30.56) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2008 16:42:28 +0000 Received: from OMTA13.emeryville.ca.mail.comcast.net ([76.96.30.52]) by QMTA06.emeryville.ca.mail.comcast.net with comcast id 3UdF1Z00217UAYkA600v00; Thu, 20 Mar 2008 16:41:57 +0000 Received: from [192.168.168.109] ([76.103.181.218]) by OMTA13.emeryville.ca.mail.comcast.net with comcast id 3Uim1Z0054j7bz88Z00000; Thu, 20 Mar 2008 16:42:46 +0000 X-Authority-Analysis: v=1.0 c=1 a=mV9VRH-2AAAA:8 a=APsptDUW2j5-3eByQEQA:9 a=PKow4onPZFYrzOSoxj1n_uf4W_gA:4 a=Dqp-bWOt5EsA:10 Message-ID: <47E29406.3080906@apache.org> Date: Thu, 20 Mar 2008 09:42:46 -0700 From: Doug Cutting User-Agent: Thunderbird 2.0.0.12 (X11/20080227) MIME-Version: 1.0 To: core-user@hadoop.apache.org Subject: Re: MapFile and MapFileOutputFormat References: <6eb82e0803200206g224de74eo75dc52f023512e49@mail.gmail.com> In-Reply-To: <6eb82e0803200206g224de74eo75dc52f023512e49@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Rong-en Fan wrote: > I have two questions regarding the mapfile in hadoop/hdfs. First, when using > MapFileOutputFormat as reducer's output, is there any way to change > the index interval (i.e., able to call setIndexInterval() on the > output MapFile)? Not at present. It would probably be good to change MapFile to get this value from the Configuration. A static method could be added, MapFile#setIndexInterval(Configuration conf, int interval), that sets "io.mapfile.index.interval", and the MapFile constructor could read this property from the Configuration. One could then use the static method to set this on jobs. If you need this, please file an issue in Jira. If possible, include a patch too. http://wiki.apache.org/hadoop/HowToContribute > Second, is it possible to tell what is the position in data file for a given > key, assuming index interval is 1 and # of keys are small? One could read the "index" file explicitly. It's just a SequenceFile, listing keys and positions in the "data" file. But why would you set the index interval to 1? And why do you need to know the position? Doug