lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Document aware analyzers was Re: deprecating Versions
Date Wed, 01 Dec 2010 13:01:50 GMT

On Nov 29, 2010, at 5:34 AM, Robert Muir wrote:

> On Mon, Nov 29, 2010 at 2:50 AM, Earwin Burrfoot <earwin@gmail.com> wrote:
>> And for indexes:
>> * Index compatibility is guaranteed across two adjacent major
>> releases. eg 2.x -> 3.x, 3.x -> 4.x.
>>  That includes both binary compat - codecs, and semantic compat -
>> analyzers (if appropriate Version is used).
>> * Older releases are most probably unsupported.
>>  e.g. 4.x still supports shared docstores for reading, though never
>> writes them. 5.x won't read them either, so you'll have to at least
>> fully optimize your 3.x indexes when going through 4.x to 5.x.
>> 
> 
> Is it somehow possible i could convince everyone that all the
> analyzers we provide are simply examples?
> This way we could really make this a bit more reasonable and clean up
> a lot of stuff.
> 
> Seems like we really want to move towards a more declarative model
> where these are just config files... so only then it will ok for us to
> change them because they suddenly aren't suffixed with .java?!

While we are at it, how about we make the Analysis process document aware instead of Field
aware?  The PerFieldAnalyzerWrapper, while doing exactly what it says it does, is just silly.
 If you had an analysis process that was aware, if it chooses to be, of the document as a
whole then you open up a whole lot more opportunity for doing interesting analysis while losing
nothing towards the individual treatment of fields.  The TeeSink stuff is an attempt at this,
but it is not sufficient.

Just a thought,
Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message