lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benzion G <benzi...@yahoo.com>
Subject Re: parsing Java log file with Lucene 3.0.3
Date Tue, 04 Jan 2011 17:48:01 GMT

OK, I succeeded to write an Analyzer I need. I can't say that I understood
all Lucene Analyzer-Tokenizer-Filter logic, but here's attached MyAnalyzer.
Hope it will help somebody else.


import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.CharTokenizer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;

public class MyAnalyzer extends Analyzer
{
	public TokenStream tokenStream(String field, final Reader reader)
	{
		TokenStream result = new MyCharTokenizer(reader);
		result = new StandardFilter(result);
		result = new LowerCaseFilter(result);
		result = new StopFilter(true, result,
StopAnalyzer.ENGLISH_STOP_WORDS_SET);

		return result;		
	}

	static class MyCharTokenizer extends CharTokenizer
	{
		public static final char[] BAD_CHARACTERS =
		{ '.', ',', ':', '(', ')', ' ', '[', ']', ';', '\'', '"', '|', '-', '_',
'*', '<', '>', '=', '+', '%', '#', '~', '`', '^'};


		public MyCharTokenizer(Reader input)
		{
			super(input);
		}


		@Override
		protected boolean isTokenChar(char paramChar)
		{
			if (Character.isLetterOrDigit(paramChar))
			{
				return true; 
			}
			else
			{
				return false;
			}
			
			//if you need to filter out specific characters and not just
non-digits-or-letters as above 
			//for (int i = 0; i < BAD_CHARACTERS.length; i++)
			//{
			//	if (BAD_CHARACTERS[i] == paramChar)
			//	{
			//		return false;
			//	}
			//}

			//return true;
		}
	}
}

-- 
View this message in context: http://lucene.472066.n3.nabble.com/parsing-Java-log-file-with-Lucene-3-0-3-tp2173046p2193022.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message