lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <ear...@gmail.com>
Subject Re: deprecating Versions
Date Mon, 29 Nov 2010 11:45:34 GMT
I agree current Analyzers are a heap of bad copypaste.
But I'd rather have an ability to compose a number of CharFilters,
Tokenizers and TokenFilters programmatically (without writing a new
Analyzer), instead of using config-files.

Something that roughly looks like:
Analyzer a = new AnalyzerBuilder().
  filterStreamWith(charFilterA, charFilterB).
  tokenizeWith(new MyFluffyTokenizer()).
  filter(new StopWordsFilter(..)).
  filter(whatever).
  build();

Configgy stuff can then appear as a layer over such API.

Building Analyzers programmatically has a number of benefits:
1. Easier tests. Everything being tested is in your test method, not
smeared across a bunch of config files (wink@Solr).
2. You can play around in REPL.
3. You might have slightly different variations of the same Analyzer.
And you don't have to write a bunch of almost-identical config files
for that.
  - i.e. in my code I have Index-mode analyzer, Index-mode
analyzer+html handling, Search-mode analyzer, that differ only in
parameters to a couple of filters.
4. Typesafety anyone?

On Mon, Nov 29, 2010 at 13:59, Uwe Schindler <uwe@thetaphi.de> wrote:
> I think with declarative model, he means more something like a "generic" Analyzer class,
where you pass in a config file that lists all CharFilters, Tokenizers, TokenFilters. You
can put this xml file or whatever into a jar file and then you have the same like hardcoded
analyzers. We have simply stupid code duplication. And using these config files you can even
supply variants for backwards compatibility.
>
> For this to implement, the factories from solr need to be moved to Lucene. Which would
be a good thing, as e.g. Hibernate Search only references Solr jars to have a declarative
(annotation-based) analyzer configuration. And for that the factories are needed.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Earwin Burrfoot [mailto:earwin@gmail.com]
>> Sent: Monday, November 29, 2010 11:53 AM
>> To: dev@lucene.apache.org
>> Subject: Re: deprecating Versions
>>
>> On Mon, Nov 29, 2010 at 13:34, Robert Muir <rcmuir@gmail.com> wrote:
>> > On Mon, Nov 29, 2010 at 2:50 AM, Earwin Burrfoot <earwin@gmail.com>
>> wrote:
>> >> And for indexes:
>> >> * Index compatibility is guaranteed across two adjacent major
>> >> releases. eg 2.x -> 3.x, 3.x -> 4.x.
>> >>  That includes both binary compat - codecs, and semantic compat -
>> >> analyzers (if appropriate Version is used).
>> >> * Older releases are most probably unsupported.
>> >>  e.g. 4.x still supports shared docstores for reading, though never
>> >> writes them. 5.x won't read them either, so you'll have to at least
>> >> fully optimize your 3.x indexes when going through 4.x to 5.x.
>> >>
>> >
>> > Is it somehow possible i could convince everyone that all the
>> > analyzers we provide are simply examples?
>> > This way we could really make this a bit more reasonable and clean up
>> > a lot of stuff.
>> At the very least, you don't have to convince me. :)
>>
>> > Seems like we really want to move towards a more declarative model
>> > where these are just config files... so only then it will ok for us to
>> > change them because they suddenly aren't suffixed with .java?!
>> No freakin' declarative models! That's the domain of Solr.
>> Though others might disagree and then happily store these declarations
>> within index, and then per-segment, making the mess even more messy for
>> the glory of backasswards compatibility.
>>
>>
>> --
>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>> Phone: +7 (495) 683-567-4
>> ICQ: 104465785
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message