lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: How to filter fields with hits from result set
Date Wed, 23 May 2007 21:04:51 GMT
Two things to watch...

1> Think about indexing the special page-end token with an
increment gap of 0 (see SynonymAnalyzer in Lucene In
Action). That preserves the sense of phrases across
page breaks.

2> Assembling the span query is tricky. Search the mail archive
for SpanQuery to see an exchange I had with the originator of
this concept. Suffice it to say that converting an ad-hoc query
into a set of SpanQueries is not trivial, but it certainly is do-able.
But you'd have a much easier time of it if you were able to
control the queries and dis-allow ad-hoc queries. It all depends
upon the requirements of the application. Any time you can
avoid supporting arbitrary boolean logic for the user input, your
job is easier <G>....

But you should be able to run up a demo with simple queries that
you control to prove out the methodology in any case.....

Best
Erick


On 5/23/07, Andreas Guther <Andreas.Guther@markettools.com> wrote:
>
> Eric,
>
> Thank you very much for your response.  That sounds very interesting.
> Let me do some experimenting to see if I fully understood your solution.
> Otherwise I have to come back to you with more questions.
>
> Andreas
>
>
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, May 23, 2007 12:00 PM
> To: java-user@lucene.apache.org
> Subject: Re: How to filter fields with hits from result set
>
> As luck would have it, I've done something very similar. What I had
> to do is index a special token at the end of each page. Then I could
> get the term offsets for each page....
>
> Then I used one of the SpanQuery.getSpans to get all of the
> offsets of the hits throughout all of the pages.
>
> now I have a list of all the offsets of the *last* term on each
> page and a list of the offsets of the hits. From these two
> lists I can know which pages have hits.
>
>
> Best
> Erick
>
> On 5/23/07, Andreas Guther <Andreas.Guther@markettools.com> wrote:
> >
> > Hi,
> >
> > If a search returns a document that has multiple fields with the same
> > name, is there a way to filter only those fields that contain hits?
> >
> >
> > Background:
> >
> > I am indexing documents and we store all content in our index for
> > display reasons.  We want to show only those pages containing hits.
> My
> > first implementation was saving each page in a Lucene document.  For
> > performance reasons why are now looking into indexing the complete
> > indexed document as a single Lucene document.
> >
> > Every page is added to a field in the Lucene document named
> > page-content.  That means I am ending with as many fields named
> > page-content as the document has pages.
> >
> > My search now returns me a single Lucene document in contrary to my
> > first approach with page per Lucene document.  My problem right now
> is:
> > how can I limit the returned page-contents fields for pages to those
> > field entries that contain hits.  If I have hits on pages five pages
> > from a document with 10 pages I would like to have only the pages with
> > the hits, not all.
> >
> > Is there anything in Lucene that limits the returned fields to fields
> > with hits only?
> >
> > Thanks in advance,
> >
> > Andreas
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message