lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arun r <arun....@gmail.com>
Subject get wordno, lineno, pageno for term/phrase
Date Tue, 03 Aug 2010 14:58:00 GMT
hi all,
            I am new to Lucene. I am trying to use Lucene to generate
data for a document classifier. I need to generate wordno, lineno,
pageno for each term/phrase. I was able to use SpanQuery/SpanNearQuery
to get the wordno (span.start()) for the term/phrase. To get pageno
and lineno, a custom Analyzer needs to be written ? Can the Analyzer
be made to recognize and newline and page feed characters and keep
track of lineno and pageno for the tokens ?

Is it possible with existing Lucene Analyzer ?

Thanks,
Arun

-- 
Where there is a will, there is a way !

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message