From "Robert Muir (JIRA)" <>
Subject [jira] Updated: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project
Date Mon, 06 Jul 2009 02:01:15 GMT


Robert Muir updated LUCENE-1728:

    Attachment: LUCENE-1728.txt

Simon, below is the method I used to do the refactoring with this patch.
I know I am pressing the limits of what is a "refactoring" but in my opinion, this minor cleanup
was necessary to prevent internal structures from being exposed:
* Use of two Tokenizers in the same analyzer was confusing, WordTokenizer is now a TokenFilter.
* Analyzer uses the standard WordListLoader rather than custom stuff.
* Rather than force SmartChineseAnalyzer to keep track of internal heavyweight structures,
it implements reusableTokenStream, etc.

I added a few tests to ensure I didn't break anything in the SmartChineseAnalyzer.

## 1. clean svn checkout
## 2. run the following commands to refactor the files.

mkdir -p contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn
svn add contrib/analysis
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/*.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn
svn delete contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart
svn move contrib/analyzers/src/test/org/apache/lucene/analysis/cn/
svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/stopwords.txt contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn
svn delete contrib/analyzers/src/resources/org/apache/lucene/analysis/cn
svn move contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/
svn move contrib/analyzers contrib/analysis

## 3. eclipse "refresh" at project level.
## 4. set text-file encoding at project level to UTF-8
## 5. manually force text-file encoding as UTF-8 for contrib/analysis/analyzers/src/java/org/apache/lucene/analysis/cn/package.html
##   this is an existing encoding issue that is corrected by this patch.
## 6. apply patch from clipboard (you may now remove the above hack and you will notice this
file is now detected properly as UTF-8)


