Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 34893 invoked from network); 21 Nov 2007 20:25:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Nov 2007 20:25:24 -0000 Received: (qmail 31368 invoked by uid 500); 21 Nov 2007 20:25:04 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 31325 invoked by uid 500); 21 Nov 2007 20:25:04 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 31314 invoked by uid 99); 21 Nov 2007 20:25:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2007 12:25:04 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Nov 2007 20:25:04 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 584A6714240 for ; Wed, 21 Nov 2007 12:24:43 -0800 (PST) Message-ID: <24217946.1195676683359.JavaMail.jira@brutus> Date: Wed, 21 Nov 2007 12:24:43 -0800 (PST) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-1052) Add an "termInfosIndexDivisor" to IndexReader In-Reply-To: <15241894.1194953511410.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1052: --------------------------------------- Attachment: LUCENE-1052.patch {quote} What class would we put TermInfosReader-specific setters & getters on, since that class is not public? Do we make TermInfosReader public or leave it package-private? My intuition is to leave it package-private for now, in order to retain freedom to re-structure w/o breaking applications, and because making it public would drag a lot of other stuff into the public. We could consider making SegmentReader public, so that there's a public class that corresponds to the concrete index implementation, but that'd also drag more stuff public (like DirectoryIndexReader). {quote} Agreed: package private. People who do advanced things should be fine with that. {quote} Another option is to make a public class whose purpose is just to only such parameters, something like SegmentIndexParameters. That'd be my first choice and was the direction I pointed in my initial proposal, but with considerably less explanation. {quote} So I took a closer look at making generic properties by coding up Doug's approach (attached patch). I replaced *#setTermInfosIndexDivisor with a separate SegmentIndexProperties class that has static methods to set/get termIndexDivisor, and added/threaded down ctors that allow you to pass a LuceneProperties when opening an IndexReader. I came up with a number of questions along the way: * Who should know/store the default value for a given property? TermIndexDivisor defaults to 1. . Is this stored in that static facade class (a)? Or, passed in as defaultValue arg by TermInfosReader when it looks up the property (b)? Or, do we make a base DefaultLuceneProperties that has the default set for all properties (c)? . (b) is nice because I feel like the default should live in the class that uses it, but then that's bad because the outside world can't see the default value. * Every property must clearly define when it will be looked at. So for termIndexDivisor in the javadoc we would say "it's used only when the termInfos index is loaded (once)". This means changing that property after termInfos index is loaded has no effect. * We should presumably create a default LuceneProperties to save checking for props != null everywhere when user didn't make their own props. This favors option (c) in the first bullet above. * Presumably once you've created a class, passing in your props instance, you cannot later install a new props instance. The LuceneProperties class is "write once". * We would need guidelines for when something should be an arg to the ctor, setter/getter on the class. I think there are shades of gray here. After this, I suddenly realized if we indeed make termIndexDivisor a generic property, it's actually hard for Chuck to then do his formula by looking at the size of the .tii file: when the index has multiple segments, you would presumably need to set different indexDivisors for each segment, but the properties only lets you set one global value. You could carefully set the property, then somehow get ahold of just that one SegmentReader and have it load the term index, then move onto the next one, etc, but that's quite messy. Note that this limitation is also the case with the top-level setTermInfosIndexDivisor as it now stands in trunk -- it's not easy to set different index divisors per segment. It almost feels like we should have "hooks" that are invoked at certain times, like when we are about to load the term infos index, that give the application a chance to change something... > Add an "termInfosIndexDivisor" to IndexReader > --------------------------------------------- > > Key: LUCENE-1052 > URL: https://issues.apache.org/jira/browse/LUCENE-1052 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.2 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-1052.patch, LUCENE-1052.patch, termInfosConfigurer.patch > > > The termIndexInterval, set during indexing time, let's you tradeoff > how much RAM is used by a reader to load the indexed terms vs cost of > seeking to the specific term you want to load. > But the downside is you must set it at indexing time. > This issue adds an indexDivisor to TermInfosReader so that on opening > a reader you could further sub-sample the the termIndexInterval to use > less RAM. EG a setting of 2 means every 2 * termIndexInterval is > loaded into RAM. > This is particularly useful if your index has a great many terms (eg > you accidentally indexed binary terms). > Spinoff from this thread: > http://www.gossamer-threads.com/lists/lucene/java-dev/54371 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org