lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: get wordno, lineno, pageno for term/phrase
Date Tue, 03 Aug 2010 20:54:34 GMT
No, you can't do this with any existing analyzers I know of. Part
of the problem here is that there's no good generic way to KNOW
what a page and line are.

Have you investigated payloads? I'm not sure that's a good fit for
this particular problem, but it might be worth investigating.

Best
Erick

On Tue, Aug 3, 2010 at 10:58 AM, arun r <arun.raj@gmail.com> wrote:

> hi all,
>            I am new to Lucene. I am trying to use Lucene to generate
> data for a document classifier. I need to generate wordno, lineno,
> pageno for each term/phrase. I was able to use SpanQuery/SpanNearQuery
> to get the wordno (span.start()) for the term/phrase. To get pageno
> and lineno, a custom Analyzer needs to be written ? Can the Analyzer
> be made to recognize and newline and page feed characters and keep
> track of lineno and pageno for the tokens ?
>
> Is it possible with existing Lucene Analyzer ?
>
> Thanks,
> Arun
>
> --
> Where there is a will, there is a way !
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message