lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?
Date Mon, 25 May 2009 09:56:56 GMT
Can't you use contrib/highlighter to achieve this?

It can do both excerpting (grabbing chunk of text around each hit) and
highlighting (highlighting the specific tokens that matched, within
that excerpt).

Mike

On Mon, May 25, 2009 at 5:20 AM, KK <dioxide.software@gmail.com> wrote:
> Thanks for your response @Seid.
> Can any Lucene user give me directions on this regard? I'm stuck.
>  Really appreciate your help.
>
> Thanks,
> KK
>
> On Mon, May 25, 2009 at 2:43 PM, Seid Muhie <seidymam@gmail.com> wrote:
>
>> actually I used the normal java standard libraries for this work. I
>> used lucene only to retrieve the relevant document.
>> what you will do is, thought it is to manuall, as i don't know the way
>> it can be done by the Lucene API, you just record the location of the
>> query terms in the document (it is as easy as indexOf(query terms)).
>> But you have to be very aware of the speed of the system. then you can
>> go ahead or back, as you want.
>>
>> Once again, I think there will be also a LUcene workaround, that I am
>> not aware of it at all.
>>
>> Seid M
>>
>> On 5/25/09, KK <dioxide.software@gmail.com> wrote:
>> > Thanks for your quick response, Seid.
>> >
>> > There is one more mail I found in the archive[3/4 days old] where someone
>> > asked about extracting 3 neighbors words around the match. I think once
>> you
>> > have the position of matching term/phrase then extracting 3 or 30
>> neighbors
>> > wont be different, right? because you just have to move back/forward and
>> get
>> > the words, this sounds logically simple but I dont know how simple is
>> this
>> > implementation-wise.
>> > Also people are talking about someting called spanQueries/termvectors etc
>> to
>> > use for this purpose. I'm still to get the exact idea of how to do this.
>> > As per your mail, you used Java to extract the neighbors, Is that using
>> the
>> > standard techniques i.e using those spanqueries/termvectors or something
>> > else.
>> > If you can elaborate all this a bit It'd be very helpful.
>> >
>> > Thank you.
>> > KK>
>> >
>> > On Mon, May 25, 2009 at 10:51 AM, Seid Muhie <seidymam@gmail.com> wrote:
>> >
>> >> for my thesis work (Question Answering) I used to retrieve first the
>> >> document and then play with java to extract the needed answer.
>> >> for your case what you will do is first locate the positions of the
>> >> query terms in the document (in this case it might be distributed
>> >> throughout the document - hence difficult to get the 15/20 words) then
>> >> count something10 words forward and backward and extract the match.
>> >> this is the way I handle my problem. Hope there might be different I
>> >> dea too
>> >>
>> >> Seid M.
>> >>
>> >> On 5/25/09, KK <dioxide.software@gmail.com> wrote:
>> >> > Hi All,
>> >> > I'm trying to index some non-english web pages and I'm keeping all
the
>> >> > content of the page in a single field and the searches are working
>> fine
>> >> as
>> >> > well. Now when I search for some query it gives the complete page,
>> which
>> >> is
>> >> > expected. Now I want to restrict the showing of results to say 20
>> words
>> >> > around the match, something like google does, otherwise we cann't make
>> >> users
>> >> > to look for a match in the whole page content[I'll use highlighter
>> after
>> >> > this is done]. So getting positions of the matched word/phrase might
>> >> > help
>> >> so
>> >> > that I can extract some words before and some words after that and
>> will
>> >> show
>> >> > that to end user. Any idea on doing the same will be very helpful.
>> Thank
>> >> > you.
>> >> >
>> >> > KK.
>> >> >
>> >>
>> >>
>> >> --
>> >> "RABI ZIDNI ILMA"
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>>
>> --
>> "RABI ZIDNI ILMA"
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message