lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin O'Shea" <>
Subject Combining analyzers in Lucene
Date Sat, 05 Mar 2011 19:06:16 GMT
I have a situation where I'm using two methods in a Java class to implement
a StandardAnalyzer in Lucene to index text strings and return their word
frequencies as follows:

    public void indexText(String suffix, boolean includeStopWords)  {

        StandardAnalyzer analyzer = null;

        if (includeStopWords) {
            analyzer = new StandardAnalyzer(Version.LUCENE_30);
        else {

            // Get Stop_Words to exclude them.
            Set<String> stopWords = (Set<String>)
            analyzer = new StandardAnalyzer(Version.LUCENE_30, stopWords);

        try {

            // Index text.
            Directory index = new RAMDirectory();
            IndexWriter w = new IndexWriter(index, analyzer, true,
            this.addTextToIndex(w, this.getTextToIndex());
            // Read index.
            IndexReader ir =;
            Text_TermVectorMapper ttvm = new Text_TermVectorMapper();

            int docId = 0;

            ir.getTermFreqVector(docId, "text", ttvm);

            // Set output.
        catch(Exception ex) {
            logger.error("Error indexing elements of RSS_Feed for object " +
suffix + "\n", ex);

    private void addTextToIndex(IndexWriter w, String value) throws
IOException {
        Document doc = new Document();
        doc.add(new Field("text"), value, Field.Store.YES,
Field.Index.ANALYZED, Field.TermVector.YES));

Which works perfectly well but I would like to combine this with stemming
using a SnowballAnalyzer as well. 

This class also has two instance variables shown in a constructor below:

    public Text_Indexer(String textToIndex) {
        this.textToIndex = textToIndex;
        this.wordFrequencies = new HashMap<String, Integer>();

Can anyone tell me how best to achieve this with the code above? Should I
re-index the text when it is returned by the above code or can the stemming
be introduced into the above at all?


Mr Morgan.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message