lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Guther" <Andreas.Gut...@markettools.com>
Subject RE: How to filter fields with hits from result set
Date Wed, 23 May 2007 20:29:37 GMT
Eric,

Thank you very much for your response.  That sounds very interesting.
Let me do some experimenting to see if I fully understood your solution.
Otherwise I have to come back to you with more questions.

Andreas


 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, May 23, 2007 12:00 PM
To: java-user@lucene.apache.org
Subject: Re: How to filter fields with hits from result set

As luck would have it, I've done something very similar. What I had
to do is index a special token at the end of each page. Then I could
get the term offsets for each page....

Then I used one of the SpanQuery.getSpans to get all of the
offsets of the hits throughout all of the pages.

now I have a list of all the offsets of the *last* term on each
page and a list of the offsets of the hits. From these two
lists I can know which pages have hits.


Best
Erick

On 5/23/07, Andreas Guther <Andreas.Guther@markettools.com> wrote:
>
> Hi,
>
> If a search returns a document that has multiple fields with the same
> name, is there a way to filter only those fields that contain hits?
>
>
> Background:
>
> I am indexing documents and we store all content in our index for
> display reasons.  We want to show only those pages containing hits.
My
> first implementation was saving each page in a Lucene document.  For
> performance reasons why are now looking into indexing the complete
> indexed document as a single Lucene document.
>
> Every page is added to a field in the Lucene document named
> page-content.  That means I am ending with as many fields named
> page-content as the document has pages.
>
> My search now returns me a single Lucene document in contrary to my
> first approach with page per Lucene document.  My problem right now
is:
> how can I limit the returned page-contents fields for pages to those
> field entries that contain hits.  If I have hits on pages five pages
> from a document with 10 pages I would like to have only the pages with
> the hits, not all.
>
> Is there anything in Lucene that limits the returned fields to fields
> with hits only?
>
> Thanks in advance,
>
> Andreas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message