lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Get match exact location
Date Mon, 02 Nov 2009 12:54:51 GMT
Well, you have to do some extra work for page matching. If you search the
user list
you'll find significant discussions of paging. The short form is that you
have to
either index the entire document and record the start and/or end offset for
each page
(I'd put the results in a separate field in the document). There are various
forms of queries that give you match positions and you can look at the
stored
field with page offsets to find the page.

The other possibility is to index each page as a separate document, but this
solution suffers from the problem of matching, say, phrases that start on
one
page and end on the next.

HTH
Erick

On Mon, Nov 2, 2009 at 6:03 AM, Vicente David Guardiola Buitrago <
vicente.guardiola@firma-e.com> wrote:

> Hello, everyone,
>
>
>
> I´m using lucene to index some PDF documents an it’s working great. But I’m
> wondering if it’s possible to know some extra information of returned
> matches by lucene.
>
>
>
> I need to know the exact page where lucene has found every match, and it
> will be get if I could get also the paragraph or some information to go
> directly to the word(s) mathing.
>
>
>
> I really appreciate any advice you could give me.
>
>
>
> Best regards
>
>
>
> Vicente Guardiola
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message