lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2294) Create IndexWriterConfiguration and store all of IW configuration there
Date Sat, 06 Mar 2010 04:03:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842176#action_12842176
] 

Shai Erera commented on LUCENE-2294:
------------------------------------

bq. Hmm... I think we should still allow package private specification of the indexing chain?

Ok, I'll add it back.

bq. For IWC.setAnalyzer(null)

I preferred not to (it's not just setAnalyzer, but also MS, Similarity and IndexDeletionPolicy).
I specifically documented on each what happens if one passes null. I think it's better service
- you're not expected to pass null, and so if you do, instead of throwing an exception, we
revert to default. I thought at some point to add a restoreDefaults() method, but then realized
that doing new IWC() is not that expensive ...
I think - if someone uses IWC but receives any of those settings from the outside, instead
of asking him to always check for null, we tell him "pass null, we revert to default" ...

bq. In IWC we call it "scheduler"

Woops, fixed that !

bq. Why can't MergePolicy also live in IWC?

I've commented on it in this issue - MP requires an IW instance to be passed to its ctor.(see
my comment from 04/Mar/10 09:28 PM).

bq. IW.messageState will have a space before each of its entries 

IWC.toString() includes '\n' between settings. I thought it'd be more readable that way because
otherwise it'd be a long line. The output for me looks like that:
{code}
IW 0 [main]: dir=org.apache.lucene.store.RAMDirectory@2ca22ca2 mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@6ca06ca
index= version=3.1-dev config=matchVersion=LUCENE_31
analyzer=org.apache.lucene.analysis.WhitespaceAnalyzer
delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
commit=null
openMode=CREATE_OR_APPEND
maxFieldLength=2147483647
similarity=org.apache.lucene.search.DefaultSimilarity
termIndexInterval=128
mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
default WRITE_LOCK_TIMEOUT=1000
writeLockTimeout=1000
maxBufferedDeleteTerms=-1
ramBufferSizeMB=16.0
maxBufferedDocs=-1
{code}

Perhaps I'll print "config=\n" so that all config parameters start on the new line, and not
all but the first. Is that acceptable, or you still prefer all to be on one line, space separated?

bq. Need small jdoc for the new IW ctor

I'll add that. That one slipped :).

bq. Should we disallow all setters on IWC after it's been consumed by an IW?

I've thought about all the options that you raise, and decided to keep the situation as-is,
to let others also comment on that. So I'm glad you commented :).

* I've documented on IW.getConfig() that setting anything on the returned object has no effect
on IW instance, and if one needs to do it, one should re-instantiate IW.
* I also thought to turn off setters (setting IWC to read-only by IW), but then someone won't
be able to reuse an IWC
* Removing the fields from IW is not good, because then someone could really call getConfig().setMaxBufferedDocs
and that will affect IW, unlike what the comment says. If we want to really keep everything
in IWC, we need to clone on ctor and getConfig() time. Seems a waste to me.

I think I'll just clone the incoming IWC on IW and leave the rest as-is?

> Create IndexWriterConfiguration and store all of IW configuration there
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-2294
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2294
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: LUCENE-2294.patch
>
>
> I would like to factor out of all IW configuration parameters into a single configuration
class, which I propose to name IndexWriterConfiguration (or IndexWriterConfig). I want to
store there almost everything besides the Directory, and to reduce all the ctors down to one:
IndexWriter(Directory, IndexWriterConfiguration). What I was thinking of storing there are
the following parameters:
> * All of ctors parameters, except for Directory.
> * The different setters where it makes sense. For example I still think infoStream should
be set on IW directly.
> I'm thinking that IWC should expose everything in a setter/getter methods, and defaults
to whatever IW defaults today. Except for Analyzer which will need to be defined in the ctor
of IWC and won't have a setter.
> I am not sure why MaxFieldLength is required in all IW ctors, yet IW declares a DEFAULT
(which is an int and not MaxFieldLength). Do we still think that 10000 should be the default?
Why not default to UNLIMITED and otherwise let the application decide what LIMITED means for
it? I would like to make MFL optional on IWC and default to something, and I hope that default
will be UNLIMITED. We can document that on IWC, so that if anyone chooses to move to the new
API, he should be aware of that ...
> I plan to deprecate all the ctors and getters/setters and replace them by:
> * One ctor as described above
> * getIndexWriterConfiguration, or simply getConfig, which can then be queried for the
setting of interest.
> * About the setters, I think maybe we can just introduce a setConfig method which will
override everything that is overridable today, except for Analyzer. So someone could do iw.getConfig().setSomething();
iw.setConfig(newConfig);
> ** The setters on IWC can return an IWC to allow chaining set calls ... so the above
will turn into iw.setConfig(iw.getConfig().setSomething1().setSomething2()); 
> BTW, this is needed for Parallel Indexing (see LUCENE-1879), but I think it will greatly
simplify IW's API.
> I'll start to work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message