lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2294) Create IndexWriterConfiguration and store all of IW configuration there
Date Thu, 04 Mar 2010 15:40:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841339#action_12841339
] 

Michael McCandless commented on LUCENE-2294:
--------------------------------------------

bq. Today there is no DEFAULT .. IW forces you to pass MFL so whoever moves to the new API
can define whatever he wants. We'll default to UNLIMITED but there won't be any back-compat
issue ..

Ahh sorry right.  In the "olden days", 10000 was the default.

{quote}
bq. we could make a TokenFilter to do this?

I'm afraid that will result in changing all Analyzers to work properly? Or you mean DW (or
somewhere) will wrap whatever TS an Analyzer returns w/ this filter? That could work, but
as soon as that becomes a filter, people may use it, and wrapping their TS w/ that filter
will be unnecessary (and slow 'em down?). 
{quote}

Hmm yeah quite a hassle to fix all analyzers.  Hmmm.

bq. I guess I'd like to keep it as it is now, not turning the issue into a bigger thing ...
and a filter alone won't solve it - we'd still need to provide a way to configure it, or otherwise
everyone will need to wrap their Analyzers with such filter?

Maybe one solution is to wrap any other analyzer?  Ie, create a StopAfterNTokensAnalyzer,
 taking another analyzer that it delegates to, and then sticking on this StopAfterNTokensFilter
to each token stream.

But yeah maybe break this out as a separate issue...

bq. Also, if I'd use such a filter myself, I wouldn't put it last in the chain, so that I
can avoid doing any processing on a term that is not going to end up in the index. Although
that's not too critical because I'll be doing this for just one term ...

Actually it ought to be 0 terms wasted, with the filter @ the end -- with this StopAfterNTokensFilter,
it'll immediately return false w/o asking for the 10001th token.

bq. One thing I should add to IWC so far (I hope to post a patch even today) is a Version
parameter. For now it will be ignored, but as a placeholder to change settings in the future.

+1

> Create IndexWriterConfiguration and store all of IW configuration there
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-2294
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2294
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>             Fix For: 3.1
>
>
> I would like to factor out of all IW configuration parameters into a single configuration
class, which I propose to name IndexWriterConfiguration (or IndexWriterConfig). I want to
store there almost everything besides the Directory, and to reduce all the ctors down to one:
IndexWriter(Directory, IndexWriterConfiguration). What I was thinking of storing there are
the following parameters:
> * All of ctors parameters, except for Directory.
> * The different setters where it makes sense. For example I still think infoStream should
be set on IW directly.
> I'm thinking that IWC should expose everything in a setter/getter methods, and defaults
to whatever IW defaults today. Except for Analyzer which will need to be defined in the ctor
of IWC and won't have a setter.
> I am not sure why MaxFieldLength is required in all IW ctors, yet IW declares a DEFAULT
(which is an int and not MaxFieldLength). Do we still think that 10000 should be the default?
Why not default to UNLIMITED and otherwise let the application decide what LIMITED means for
it? I would like to make MFL optional on IWC and default to something, and I hope that default
will be UNLIMITED. We can document that on IWC, so that if anyone chooses to move to the new
API, he should be aware of that ...
> I plan to deprecate all the ctors and getters/setters and replace them by:
> * One ctor as described above
> * getIndexWriterConfiguration, or simply getConfig, which can then be queried for the
setting of interest.
> * About the setters, I think maybe we can just introduce a setConfig method which will
override everything that is overridable today, except for Analyzer. So someone could do iw.getConfig().setSomething();
iw.setConfig(newConfig);
> ** The setters on IWC can return an IWC to allow chaining set calls ... so the above
will turn into iw.setConfig(iw.getConfig().setSomething1().setSomething2()); 
> BTW, this is needed for Parallel Indexing (see LUCENE-1879), but I think it will greatly
simplify IW's API.
> I'll start to work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message