lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: [jira] Created: (LUCENE-1377) Add HTMLStripReader and WordDelimiterFilter from SOLR
Date Fri, 05 Sep 2008 05:15:08 GMT

: SOLR has two classes HTMLStripReader and WordDelimiterFilter which are 
: very useful for a wide variety of use cases.  It would be good to place 
: them into core Lucene.

FWIW: Just about every concrete TokenFilter and Tokeinzer in Solr's code 
base could and probably should be promoted up into Lucene-Java -- at the 
very least into a contrib if not into the "core"

A big reason why there hasn't been any movement to do this in many cases 
is refactoring the testcases -- most Solr tests use the Solr TestHarness 
to test things at a very high level black box style.  essentially all new 
test cases would be needed.

(in other cases there are no test cases, but they were committed to SOlr 
anyway to scratch an itch)

the best appraoch for dealing with things like this is probably to track 
each individual piece that people want to promote in seperate Jira issues 
with seperate patches ... that way if someone does right good generalized 
unit tests for WordDelimiterFilter but not HTMLStripReader (for example) 
the issues remain detangled and one can be commited before the other.

(smaller more self contained patches are a lot easier to review and 


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message