lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Payloads, Tokenizers, and Filters. Oh My!
Date Tue, 20 Nov 2007 18:38:46 GMT

: I apologize for cross-posting but  I believe both Solr and Lucene users and
: developers should be concerned with this.  I am not aware of a better way to
: reach both communities.

some of these questions strike me as being largely unrelated.  if 
anyone wishes to followup on them further, let's do it in (new) seperate 
threads for each topic, on the specific list appropriate to the topic...

:    * Do TokenFilters belong in the Solr code base at all?

Yes, in so much as any java code belongs in the Solr code base (or the 
nutch code base for that matter).  They are seperate projects with 
seperate communities and seperate needs -- that doesn't mean that there 
isn't code in Solr which could be useful to the broader community of 
lucene-java; in that case the appropriate course of action is to open a 
LUCENE issue to "promote" the code up into lucene-java, and a dependent 
issue in SOLR to deprecate the current code and use the newer code 

as some people may be aware, there was a discussion aboutthis sort of 
thing at ApacheCon during the Lucene BOF -- some reasons this doesn't 
happen as often as it seems like it should are:
  * the code may have subtle dependency tendrals that make it hard to 
    refactor from one code base to the other.
  * the tests are frequently harder to "promote" then the code (in the 
    case of most Solr tests that use the TestHarness, it's probably easier 
    to write new tests from scratch)
  * when promoting the code, it's the best time to consider wether the 
    existing API is really the "best" API before a lot of new people start 
    using it (compare Solr's FunctionQuery and Lucenes CustomScoreQuery 
    for example)
  * someone needs to care enough to follow through on the promotion.

...further discussion is best suited for java-dev since the topic is not 
Solr specific (there's a lot of Nutch code out there that people have sked 
about promoting as well)

:    * How to deal with TokenFilters that add new Tokens to the stream?

This is specificly regarding Payloads yes?  also a pretty clear cut 
java-dev discussion (and one possibly already being discussed in the 
monolithic Payload API thread i haven't started reading yet).  
lucene-java sets the API and the semantics ... Solr code will follow them.

:    * How to patch TokenFilters and Tokenizers using the model of
:      LUCENE-969 in the Solr code base and in Lucene contrib?

open SOLR issues containing a patchs for any Solr code that needs 
changed, and LUCENE issues containing patches for contrib code that needs 

: I thought it might be useful to figure out which existing TokenFilters need to
: know about Payloads.  To this end I have taken an inventory of the
: TokenFilters out there.  I think it is fair to categorize them by Add (A),
: Delete (D), Modify (M), Observe (O):

again: this is a straight forward luence-java question ... once the 
semantics have been worked out, then there can be a Solr specific 
discussion about following them.

(which is not to say that the Solr classes/use-cases shouldn't be 
considered in the discussion, just that java-dev is the right place to 
have the conversation)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message