Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 21118 invoked from network); 25 Jun 2009 20:38:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jun 2009 20:38:39 -0000 Received: (qmail 61508 invoked by uid 500); 25 Jun 2009 20:38:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 61473 invoked by uid 500); 25 Jun 2009 20:38:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 61463 invoked by uid 99); 25 Jun 2009 20:38:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 20:38:48 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [150.148.0.65] (HELO ironport4.fda.gov) (150.148.0.65) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 20:38:37 +0000 X-SBRS: None X-MID: 137681486 Received: from fmd3tj002.fda.gov ([10.160.10.77]) by ironport4.fda.gov with ESMTP; 25 Jun 2009 16:38:15 -0400 Received: from FMD3VS012.fda.gov ([10.160.10.61]) by FMD3TJ002.fda.gov with Microsoft SMTPSVC(6.0.3790.3959); Thu, 25 Jun 2009 16:38:15 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Subject: RE: Order of fields within a Document in Lucene 2.4+ Date: Thu, 25 Jun 2009 16:38:15 -0400 Message-ID: In-reply-to: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Order of fields within a Document in Lucene 2.4+ Thread-Index: Acn11Dd5ZJVjuHFJSHKfEBfNoAfUAwAAHTbQ References: From: "Sudarsan, Sithu D." To: X-OriginalArrivalTime: 25 Jun 2009 20:38:15.0082 (UTC) FILETIME=[DD4358A0:01C9F5D4] X-Virus-Checked: Checked by ClamAV on apache.org =20 I agree. Using Lucene 2.4.1 doc.getFields() returns in alpha order and not the order in which they were added. Sincerely, Sithu D Sudarsan -----Original Message----- From: Matt Turner [mailto:m4tt_turner@hotmail.com]=20 Sent: Thursday, June 25, 2009 4:33 PM To: java-user@lucene.apache.org Subject: Order of fields within a Document in Lucene 2.4+ The Lucene FAQ says... =20 What is the order of fields returned by Document.fields()? * Fields are returned in the same order they were added to the document. (now getFields() as fields is deprecated) =20 However I think this may no longer be the case in 2.4=20 =20 We are indexing documents in a specific order so that we can LOAD_AND_BREAK out of our FieldSelector as early as possible. i.e. we have typically 50 indexed fields for a document, but when we are loading results with .doc(), we know we only need 4 of them. =20 So, our code ensures that these are added to the index first - and once the 4th field is loaded we break out of the selector. =20 This speeds us up by an order of magnitude. =20 =20 =20 However, we are finding that our field selector is processing fields in alphabetical order, not order of addition. This means that we'd have to rename our fields to 'aaa..' in order to guarantee they'd be processed first. =20 =20 I think, but am not sure, that this bit of code causes the problem (as spotted in http://www.mail-archive.com/java-user@lucene.apache.org/msg24105.html). It seems to have been introduced in version 2.4 (fields are in addition order in 2.3.2) =20 DocFieldProcessorPerThread.java: // If we are writing vectors then we must visit // fields in sorted order so they are written in // sorted order. TODO: we actually only need to // sort the subset of fields that have vectors // enabled; we could save [small amount of] CPU // here. quickSort(fields, 0, fieldCount-1); =20 This appears to sort fields into alphabetical order. =20 Assuming that implementing the TODO would keep them in order of addition (and just keep vectors fields themselves sorted) - is it worth raising a JIRA to fix this ? =20 =20 regards, =20 matt =20 =20 _________________________________________________________________ Get the best of MSN on your mobile http://clk.atdmt.com/UKM/go/147991039/direct/01/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org