Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 8072 invoked from network); 2 Mar 2011 20:10:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Mar 2011 20:10:59 -0000 Received: (qmail 76350 invoked by uid 500); 2 Mar 2011 20:10:58 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 76088 invoked by uid 500); 2 Mar 2011 20:10:57 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 76076 invoked by uid 99); 2 Mar 2011 20:10:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2011 20:10:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2011 20:10:57 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 240054CC73 for ; Wed, 2 Mar 2011 20:10:37 +0000 (UTC) Date: Wed, 2 Mar 2011 20:10:37 +0000 (UTC) From: "Michael Busch (JIRA)" To: dev@lucene.apache.org Message-ID: <1719641598.8683.1299096637144.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <17518066.146081295854363610.JavaMail.jira@thor> Subject: [jira] Commented: (LUCENE-2881) Track FieldInfo per segment instead of per-IW-session MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001590#comment-13001590 ] Michael Busch commented on LUCENE-2881: --------------------------------------- {quote} w.r.t. storing FielInfos in the segments file - don't we get some consistency benefit from the segments file being small (i.e. it's all in one disk block so it either gets written in it's entirety or not, and readers will see all of it, or not)? {quote} I think we have three options here: 1) Store FIs inside of compound file (current patch) 2) Store FIs in SI 3) Store FIs in own file outside of compound file Each has disadvantages: 1) more expensive to open SegmentInfos, as it now has to open cfs files to load FIs 2) SI files becomes bigger 3) one more file descriptor per segment - but can be closed as soon as FIs was read into memory > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Michael Busch > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: LUCENE-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global field-naming / ordering. IW carries FI instances over from previous segments which also carries over field properties like isIndexed etc. While having consistent field ordering per IW session appears to be important due to bulk merging stored fields etc. carrying over other properties might become problematic with Lucene's Codec support. Codecs that rely on consistent properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment and field (using the field id within the file name). Yet, if a segment has no DocValues indexed in a particular segment but a previous segment in the same IW session had DocValues, FieldInfo#docValues will be true since those values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like hasVectors or hasProx which is really something we should manage per Codec & Segment. Ideally FieldInfo would be managed per Segment and Codec such that its properties are valid per segment. It also seems to be necessary to bind FieldInfoS to SegmentInfo logically since its really just per segment metadata. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org