lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <dmsmith...@gmail.com>
Subject Re: Having a default constructor in Analyzers
Date Sun, 07 Feb 2010 23:09:54 GMT

On Feb 7, 2010, at 5:32 PM, Sanne Grinovero wrote:

> Does it make sense to use different values across the same
> application? Obviously in the unlikely case you want to threat
> different indexes in a different way, but does it make sense when
> working all on the same index?

I think it entirely depends on the use case. In my use case, my app is indexing one book per
index with each sentence or paragraph (depends on the book) as a document. The app lives on
a user's desktop and they can download books on an as needed basis and them index them in
that app.

I don't have it yet, but need to: Imagine that each index maintains a manifest of the toolchain
for the index, which includes the version of each part of the chain. Since the index is created
all at once, this probably is the same as the version of lucene. When the user searches the
index the manifest is consulted to recreate the toolchain.

Suppose the user has updated the application a couple of times and now is sitting at Lucene
4.7. Any index at VERSION 1.9.x (not that we go back that far) has been obsoleted, but all
the 2.x and 3.x are still in play, based upon the backward compatibility policy. (2.x is in
play from an index compatibility perspective, but not an API perspective.)

But what does Version 3.2 mean at 4.7. For a given filter, it may not have changed from 3.2
to 3.6. Those versions and in between are equivalent for that filter, but another filter in
the same tool chain may have been changed at 3.4.

> If not, why not introduce a value like "Version.BY_ENVIRONMENT" which
> is statically initialized to be one of the other values, reading from
> an environment parameter?

Environment parameters are not per index, but per JVM.

> So you get the latest at first deploy, and can then keep compatibility
> as long as you need, even when updating Lucene.
> This way I could still have the safety of pinning down a specific
> version and yet avoid rebuilding the app when changing it.
> Of course the default would be LUCENE_CURRENT, so that people trying
> out Lucene get all features out of the box, and warn about setting it
> (maybe log a warning when not set).
> 
> Also, wouldn't it make sense to be able to read the recommended
> version from the Index?

Absolutely!

> I'd like to have the hypothetical AnalyzerFactory to find out what it
> needs to build getting information from the relevant IndexReader; so
> in the case I have two indexes using different versions I won't get
> mistakes. (For a query on index A I'm creating a QueryParser, so let's
> ask the index which kind of QueryParser I should use...)

IIRC: This is something that Marvin has implemented in Lucy. And what I was talking about
above.

> 
> just some ideas, forgive me if I misunderstood this usage (should
> avoid writing late in the night..)
> Regards,
> Sanne
> 
> 
> 
> 2010/2/7 Simon Willnauer <simon.willnauer@googlemail.com>:
>> On Sun, Feb 7, 2010 at 8:38 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>> Simon, can you explain how removing CURRENT makes it harder for users to
>>> upgrade? If you mean for the case of people that always re-index all
>>> documents when upgrading lucene jar, then this makes sense to me.
>> That is what I was alluding to!
>> Not much of a deal though most IDEs let you upgrade via refactoring
>> easily and we can document this too. Yet we won't have a drop in
>> upgrade anymore though.
>> 
>>> 
>>> I guess as a step we can at least deprecate this thing and strongly
>>> discourage its use, please see the patch at LUCENE-2080.
>>> 
>>> Not to pick on Sanne, but his wording about: "Of course more advanced use
>>> cases would need to pass parameters but please make the advanced usage
>>> optional", this really caused me to rethink CURRENT, because CURRENT itself
>>> should be the advanced use case!!!
>>> 
>>> On Sun, Feb 7, 2010 at 2:34 PM, Simon Willnauer
>>> <simon.willnauer@googlemail.com> wrote:
>>>> 
>>>> Sanne, I would recommend you building a Factory pattern around you
>>>> Analyzers / TokenStreams similar to what solr does. That way you can
>>>> load you own "default ctor" interface via reflection and obtain you
>>>> analyzers from those factories. That makes more sense anyway as you
>>>> only load the factory via reflection an not the analyzers.
>>>> 
>>>> @Robert: I don't know if removing LUCENE_CURRENT is the way to go. On
>>>> the one hand it would make our live easier over time but would make it
>>>> harder for our users to upgrade. I would totally agree that for
>>>> upgrade safety it would be much better to enforce an explicit version
>>>> number so upgrading can be done step by step. Yet, if we deprecate
>>>> LUCENE_CURRENT people will use it for at least the next 3 to 5 years
>>>> (until 4.0) anyway :)
>>>> 
>>>> simon
>>>> 
>>>> On Sun, Feb 7, 2010 at 8:17 PM, Sanne Grinovero
>>>> <sanne.grinovero@gmail.com> wrote:
>>>>> Thanks for all the quick answers;
>>>>> 
>>>>> finding the ctor having only a Version parameter is fine for me, I had
>>>>> noticed this "frequent pattern" but didn't understand that was a
>>>>> general rule.
>>>>> So can I assume this is an implicit contract for all Analyzers, to
>>>>> have either an empty ctor or a single-parameter of type Version?
>>>>> 
>>>>> I know about the dangers of using LUCENE_CURRENT, but rebuilding the
>>>>> index is not always something you need to avoid.
>>>>> Having LUCENE_CURRENT is for example useful for me to test Hibernate
>>>>> Search towards the current Lucene on classpath, without having to
>>>>> rebuild the code.
>>>>> 
>>>>> thanks for all help,
>>>>> Sanne
>>>>> 
>>>>> 
>>>>> 2010/2/7 Robert Muir <rcmuir@gmail.com>:
>>>>>> I propose we remove LUCENE_CURRENT completely, as soon as TEST_VERSION
>>>>>> is
>>>>>> done.
>>>>>> 
>>>>>> On Sun, Feb 7, 2010 at 12:53 PM, Uwe Schindler <uwe@thetaphi.de>
wrote:
>>>>>>> 
>>>>>>> Hi Sanne,
>>>>>>> 
>>>>>>> Exactly that usage we want to prevent. Using Version.LUCENE_CURRENT
is
>>>>>>> the
>>>>>>> badest thing you can do if you want to later update your Lucene
>>>>>>> version and
>>>>>>> do not want to reindex all your indexes (see javadocs).
>>>>>>> 
>>>>>>> It is easy to modify your application to create analyzers even
from
>>>>>>> config
>>>>>>> files using the reflection way. Just find a constructor taking
Version
>>>>>>> and
>>>>>>> call newInstance() on it, not directly on the Class. It's just
one
>>>>>>> line of
>>>>>>> code more.
>>>>>>> 
>>>>>>> Uwe
>>>>>>> 
>>>>>>> -----
>>>>>>> Uwe Schindler
>>>>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>>>> http://www.thetaphi.de
>>>>>>> eMail: uwe@thetaphi.de
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Sanne Grinovero [mailto:sanne.grinovero@gmail.com]
>>>>>>>> Sent: Sunday, February 07, 2010 6:33 PM
>>>>>>>> To: java-dev@lucene.apache.org
>>>>>>>> Subject: Having a default constructor in Analyzers
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> I've seen that some core Analyzers are now missing a default
>>>>>>>> constructor; this is preventing many applications to configure/load
>>>>>>>> Analyzers by reflection, which is a common use case to have
>>>>>>>> Analyzers
>>>>>>>> chosen in configuration files.
>>>>>>>> 
>>>>>>>> Would it be possible to add, for example, a constructor like
>>>>>>>> 
>>>>>>>> public StandardAnalyzer() {
>>>>>>>>    this(Version.LUCENE_CURRENT);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> ?
>>>>>>>> 
>>>>>>>> Of course more advanced use cases would need to pass parameters
but
>>>>>>>> please make the advanced usage optional; I have now seen
more than a
>>>>>>>> single project break because of this (and revert to older
Lucene).
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Sanne
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Robert Muir
>>>>>> rcmuir@gmail.com
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>> 
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message