lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1052) Add an "termInfosIndexDivisor" to IndexReader
Date Wed, 21 Nov 2007 00:27:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544125
] 

Doug Cutting commented on LUCENE-1052:
--------------------------------------

What class would we put TermInfosReader-specific setters & getters on, since that class
is not public?  Do we make TermInfosReader public or leave it package-private?  My intuition
is to leave it package-private for now, in order to retain freedom to re-structure w/o breaking
applications, and because making it public would drag a lot of other stuff into the public.
 We could consider making SegmentReader public, so that there's a public class that corresponds
to the concrete index implementation, but that'd also drag more stuff public (like DirectoryIndexReader).

I'm also not yet convinced that it is critical to support arbitrary formulae for this feature.
 Sure, it would be nice, but it has costs, like increasing public APIs that must be supported.
 Folks have done fine without this feature for many years.  Adding a simple integer divisor
is a sufficient initial step here.

So, even if we add a configuration system, I think the setter methods could still end up on
IndexReader.  The difference is primarily whether the methods are:

public void setTermIndexInterval(int interval);
public void setTermIndexDivisor(int divisor);

or

public static void setTermIndexInterval(LuceneProps props, int interval);
public static void setTermIndexDivisor(LuceneProps props, int divisor);

With the latter just a façade that uses package-private stuff.  I think the latter style
will be handy as we start adding parameters to, e.g., Query classes.  In those cases we'll
probably want façade's too, since a Query setter will probably really tweak something for
a private Scorer class.  In the case of indexes, however, we don't have a public, concrete
class.

Another option is to make a public class whose purpose is just to only such parameters, something
like SegmentIndexParameters.  That'd be my first choice and was the direction I pointed in
my initial proposal, but with considerably less explanation.

> Add an "termInfosIndexDivisor" to IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-1052
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1052
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.2
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1052.patch, termInfosConfigurer.patch
>
>
> The termIndexInterval, set during indexing time, let's you tradeoff
> how much RAM is used by a reader to load the indexed terms vs cost of
> seeking to the specific term you want to load.
> But the downside is you must set it at indexing time.
> This issue adds an indexDivisor to TermInfosReader so that on opening
> a reader you could further sub-sample the the termIndexInterval to use
> less RAM.  EG a setting of 2 means every 2 * termIndexInterval is
> loaded into RAM.
> This is particularly useful if your index has a great many terms (eg
> you accidentally indexed binary terms).
> Spinoff from this thread:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/54371

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message