lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <gsing...@syr.edu>
Subject Re: incorrect OO in lucene source?
Date Tue, 20 Apr 2004 14:32:26 GMT
I agree with Robert, as I have had similar wishes about more interface capabilities, but also
agree with Eric in that Lucene works great in a lot of ways.    I have found the current design
causes you to have to hard code things that shouldn't need to be hard coded, especially in
the TokenStream area.  The idea of writing a new Analyzer every time you want to change a
Tokenizer or TokenFilter is very limiting.  In my application I need the flexibility to re-index
and evaluate fairly often.  The current Analyzer implementation would require me to write
a new Analyzer for every experiment and that is not manageable.  Do others have this issue?

I submitted a "broken" patch that converts the analyzers and token streams to interfaces,
but as Doug pointed out, it is not currently thread safe (I have another version that uses
reflection that is thread safe).  I intend to go back and make it thread-safe, but haven't
had the time.  Anyway, this patch contains an interface implementation of Analyzer and TokenStream
that we may find useful in the future and if someone else wants to take up the ball and make
it thread-safe, I don't think it would take too long.

>>> tdv@part.net 04/20/04 10:14AM >>>
Erik Hatcher wrote:

> Robert,
>
> I agree with your sentiment about interfaces.  I haven't looked into 
> the specifics of what you mention about IndexReader though.
>
> Out of curiosity - is there something about Lucene's design that is 
> preventing you from extending Lucene?  Pragmatically speaking, we 
> shouldn't change anything unless there is a compelling reason to do 
> so, and while a clean architecture is a nice ideal, if it ain't broke 
> don't fix it :)
>
>     Erik
>
>
> On Apr 19, 2004, at 8:12 PM, Robert Engels wrote:
>
>> Lucene is often cited as an excellent example of OO design.
>>
>> I see several areas where this does not seem to be the case, mainly 
>> because
>> of the Lucene code uses abstract base classes where it should be using
>> interfaces, and other cases, where an interface makes reference to an
>> abstract base class as a method/return parameter.
>>
>> It seems the abstract base classes were created so there was a place 
>> to put
>> the static 'helper' methods. Would it not be better to have separate
>> interfaces and abstract base classes that the implementations could 
>> extend
>> if necessary?
>>
>> Some examples are listed below. Am I missing something, or is the code
>> structured this way for a reason?
>>
>> -----
>>
>> The main problem seems to be the use of IndexReader in many 
>> interfaces, when
>> it should be using a Searchable. Otherwise, what is the point of 
>> Searchable.
>>
>> The following object use chain shows the 'pointlessness' of Searchable:
>>
>> Searchable->Query->Weight->IndexReader
>>
>> -----
>>
>> The Query object has the method:
>>
>> public Query rewrite(IndexReader reader) throws IOException...
>>
>> but the entire search package is based on 'Searchable's, should it 
>> not be
>>
>> public Query rewrite(Searchable searchable) throws IOException...
>>
>> If not, the entire idea of 'external' searchable implementations break,
>> because the end up having to 'know' an IndexReader, and IndexReader is a
>> baseclass, etc.
>>
>> The same goes for
>>
>> public Weight weight(Searcher searcher);
>>
>> should this not be a 'Searchable'?
>>
>> -----
>>
>> The Weight interface has the following:
>>
>> Scorer scorer(IndexReader reader) throws IOException;
>>
>> Should it not be
>>
>> Scorer scorer(Searchable searchable) throws IOException;
>>
>> The same goes for explain().
>>
>> -----
>>
>> It seems that Searchable was created because IndexReader was not an
>> interface, but the methods have not been changed to use a Searchable 
>> instead
>> of IndexReader.
>>
>> -----
>>
>> It seems many of the 'search' package classes use a 'Searcher', when 
>> they
>> should be using a 'Searchable'. If these core classes really need the
>> methods defined in 'Searcher', then these methods should be moved to
>> 'Searchable'.
>>
>> -----
>>
>> Shouldn't 'Filter' just be an interface, with the method
>>
>> boolean filter(int docnum);
>>
>> as to whether or not the document should be filtered?
>>
>> -----
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 
>
I have had similar wishes recently for interfaces for TokenStream and 
Analyzer.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org 
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message