Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 50746 invoked from network); 27 Jun 2006 04:07:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 27 Jun 2006 04:07:56 -0000 Received: (qmail 15102 invoked by uid 500); 27 Jun 2006 04:07:55 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 15069 invoked by uid 500); 27 Jun 2006 04:07:55 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 15060 invoked by uid 99); 27 Jun 2006 04:07:55 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jun 2006 21:07:55 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of sutter@gmail.com designates 64.233.162.198 as permitted sender) Received: from [64.233.162.198] (HELO nz-out-0102.google.com) (64.233.162.198) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jun 2006 21:07:54 -0700 Received: by nz-out-0102.google.com with SMTP id s1so1621158nze for ; Mon, 26 Jun 2006 21:07:34 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=HfH2rUOZnH/AY3kh6lRfuGe09lHENrq4N1aeXjU4TJklF3BVg/SOtXas9v1ge78SFSlpMAtXOnSckVW/f+bq+NJ1aC5pgnXmdolEjLTcQZ2/S7eALoKryjN4RGeWqmOrsca61QcsMrTh3Yh3tLXFppd8FzgDk/gNPlcUdvxdDeA= Received: by 10.36.178.19 with SMTP id a19mr91936nzf; Mon, 26 Jun 2006 21:07:34 -0700 (PDT) Received: by 10.36.132.16 with HTTP; Mon, 26 Jun 2006 21:07:34 -0700 (PDT) Message-ID: Date: Mon, 26 Jun 2006 21:07:34 -0700 From: "Paul Sutter" To: hadoop-dev@lucene.apache.org Subject: Re: Redundant (?) lengths in SequenceFile In-Reply-To: <44A0873C.30306@apache.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_3679_10934229.1151381254097" References: <4496E066.2060701@apache.org> <238b647beae2c486eae4cd21eab67fcd@yahoo-inc.com> <49B33F79-152B-4054-9421-AE2546E8918D@yahoo-inc.com> <44A0562F.1010204@apache.org> <44A0873C.30306@apache.org> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_3679_10934229.1151381254097 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline its fine either way, i just wanted to know that its deleberate. it certainly doesnt cost much space. an external sorter can work only on keytypes it understands, and our keys are compound and may include strings, floats, and ints. the length is no problem at all. On 6/26/06, Doug Cutting wrote: > > Paul Sutter wrote: > > However -> It still seems to me that the key length in the sequence file > is > > redundant. > > What if your keys are compound, containing, say, a combination of > floats, ints and strings? Then the key may not include a length of the > entire key entry. So you're seeking to optimize a special (if common) > case. > > Doug > ------=_Part_3679_10934229.1151381254097--