lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Landon Cox" <l...@interactive-media.com>
Subject RE: QueryParser question - case-sensitivity
Date Thu, 09 May 2002 21:23:04 GMT

Ok, this is the solution and it seems to have worked like a charm.  I took
Doug's fragment as a starting point, but enhanced it to be general purpose.
Instead of the keyword field name being hardwired into the tokenStream
method, the derived Analyzer class, in this case DCRAnalyzer, accepts a
hashtable of keyword fieldnames.  As long as the keyword's value in the hash
is != null, this code below will work, so you can initialize the keyword's
value with any object you care about.

If you create another class derived from Hashtable that wires your app's
keyword fieldnames into it, an instance of that class can be passed into
DCRAnalyzer so all that all the application specific keyword knowledge
remains contained in one app class, but this code can remain general.

For my app, the XML 'id' attribute of all tags fall into this category of
keyword fields to pass through unscathed.  I'm sure I'll add others over
time which is why hash seemed convenient and fast.  Anyway, this all tested
out as expected and now the analyzer has the 'smarts' needed for
case-sensitivity on different fieldnames.

/*
 * DCRAnalyzer.java
 *
 * Created on May 9, 2002, 1:14 PM
 */

package <<<yourpackagenamegoeshere>>>;

import java.io.*;
import java.util.*;

import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.StandardAnalyzer;


public class DCRAnalyzer extends Analyzer
{

    /** Creates a new instance of DCRAnalyzer */
    public DCRAnalyzer( Hashtable keywordFieldNames )
    {
        m_keywordNames = keywordFieldNames;

    }

    public TokenStream tokenStream( String field, final Reader reader )
    {
        // see if field is a designated keyword name, if so, don't run it
through standard
        if ( m_keywordNames.get(field) != null )
        {
            return new WhitespaceTokenizer(reader);

        } else {
            return m_standard.tokenStream(field, reader);
        }

    }

    private Hashtable m_keywordNames = new Hashtable();
    private Analyzer m_standard = new StandardAnalyzer();
}

Not much code, but it nicely did the trick and you could easily extend it to
support numerous analyzers mapped to fieldnames, not just these two.  Thanks
for the various bits of advice from Doug, DaveP, and Otis.

Landon Cox


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message