Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 55396 invoked from network); 12 Oct 2010 06:23:57 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Oct 2010 06:23:57 -0000 Received: (qmail 55393 invoked by uid 500); 12 Oct 2010 06:23:56 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 55253 invoked by uid 500); 12 Oct 2010 06:23:53 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 55241 invoked by uid 99); 12 Oct 2010 06:23:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Oct 2010 06:23:53 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Oct 2010 06:23:52 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9C6NWYq016448 for ; Tue, 12 Oct 2010 06:23:32 GMT Message-ID: <26739843.90951286864612467.JavaMail.jira@thor> Date: Tue, 12 Oct 2010 02:23:32 -0400 (EDT) From: "Simon Willnauer (JIRA)" To: dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (LUCENE-2186) First cut at column-stride fields (index values storage) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920110#action_12920110 ] Simon Willnauer edited comment on LUCENE-2186 at 10/12/10 2:22 AM: ------------------------------------------------------------------- created branch at [docvalues|http://http://svn.apache.org/repos/asf/lucene/dev/branches/docvalues/] and committed the last patch at r1021636. I think the next steps are adding a fix version "docvalues" to JIRA and create new issues according to the "roadmap" above. Once we are through with the mandatory stuff and documentation we can land this on trunk. Thoughts? I'm not sure if we should continue on this issue or close it and create a new "top level" one and spawn issues from there. simon was (Author: simonw): created branch at [http://http://svn.apache.org/repos/asf/lucene/dev/branches/docvalues/|docvalues] and committed the last patch at r1021636. I think the next steps are adding a fix version "docvalues" to JIRA and create new issues according to the "roadmap" above. Once we are through with the mandatory stuff and documentation we can land this on trunk. Thoughts? I'm not sure if we should continue on this issue or close it and create a new "top level" one and spawn issues from there. simon > First cut at column-stride fields (index values storage) > -------------------------------------------------------- > > Key: LUCENE-2186 > URL: https://issues.apache.org/jira/browse/LUCENE-2186 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Reporter: Michael McCandless > Assignee: Simon Willnauer > Fix For: 4.0 > > Attachments: LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, LUCENE-2186.patch, mem.py > > > I created an initial basic impl for storing "index values" (ie > column-stride value storage). This is still a work in progress... but > the approach looks compelling. I'm posting my current status/patch > here to get feedback/iterate, etc. > The code is standalone now, and lives under new package > oal.index.values (plus some util changes, refactorings) -- I have yet > to integrate into Lucene so eg you can mark that a given Field's value > should be stored into the index values, sorting will use these values > instead of field cache, etc. > It handles 3 types of values: > * Six variants of byte[] per doc, all combinations of fixed vs > variable length, and stored either "straight" (good for eg a > "title" field), "deref" (good when many docs share the same value, > but you won't do any sorting) or "sorted". > * Integers (variable bit precision used as necessary, ie this can > store byte/short/int/long, and all precisions in between) > * Floats (4 or 8 byte precision) > String fields are stored as the UTF8 byte[]. This patch adds a > BytesRef, which does the same thing as flex's TermRef (we should merge > them). > This patch also adds basic initial impl of PackedInts (LUCENE-1990); > we can swap that out if/when we get a better impl. > This storage is dense (like field cache), so it's appropriate when the > field occurs in all/most docs. It's just like field cache, except the > reading API is a get() method invocation, per document. > Next step is to do basic integration with Lucene, and then compare > sort performance of this vs field cache. > For the "sort by String value" case, I think RAM usage & GC load of > this index values API should be much better than field caache, since > it does not create object per document (instead shares big long[] and > byte[] across all docs), and because the values are stored in RAM as > their UTF8 bytes. > There are abstract Writer/Reader classes. The current reader impls > are entirely RAM resident (like field cache), but the API is (I think) > agnostic, ie, one could make an MMAP impl instead. > I think this is the first baby step towards LUCENE-1231. Ie, it > cannot yet update values, and the reading API is fully random-access > by docID (like field cache), not like a posting list, though I > do think we should add an iterator() api (to return flex's DocsEnum) > -- eg I think this would be a good way to track avg doc/field length > for BM25/lnu.ltc scoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org