lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arun r <arun....@gmail.com>
Subject Re: get wordno, lineno, pageno for term/phrase
Date Fri, 06 Aug 2010 17:37:27 GMT
I am trying to create a custom analyzer that will check for pagebreak
and linebreak and add the payload data for each term. In the custom
filter I have this code:

public boolean incrementToken() throws IOException {
		
		if(input.incrementToken())
		{
			if(termAtt.term().equals(pageBreak)){
				System.out.println("pageBreak");
				pageCount++;
			}
			else if(termAtt.term().equals(lineBreak))
			{
				System.out.println("lineBreak");
				lineCount++;
			}
			else
				addPayload(lineCount, pageCount);
				
			return true;
		}
		else		
			return false;
	}

where pageBreak and lineBreak is defined as :
int pageBreakAscii = 12;
String pageBreak = new Character ((char) pageBreakAscii).toString();
String lineBreak = System.getProperty("line.separator");

And am using the WhitespaceAnalyzer tokenstream, which ignores the
pageBreak and lineBreak. Is there a way to create a analyzer that will
ignore the pagebreak and linebreak characters during search, but give
access to them in  incrementToken() in the filter ?
	

-- 
Where there is a will, there is a way !

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message