lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?
Date Mon, 25 May 2009 14:17:48 GMT
I would do some googling to find examples, or read the javadocs for
the highlighter package?

Or pick up copy of the early-release of Lucene in Action 2nd edition
from http://manning.com [disclaimer: I'm one of the authors on that
book!].  We've revamped the highlighter coverage (in chapter 8)...

Mike

On Mon, May 25, 2009 at 6:19 AM, KK <dioxide.software@gmail.com> wrote:
> Thanks @Michael.
> I've no idea about this contrib though I'm looking into highlighter. Can you
> throw some lights on the same. The steps to be taken for achieving the same.
> I'm completely new to this thing. Can you point me to some examples for the
> same? Thank you.
>
> KK.
>
> On Mon, May 25, 2009 at 3:26 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Can't you use contrib/highlighter to achieve this?
>>
>> It can do both excerpting (grabbing chunk of text around each hit) and
>> highlighting (highlighting the specific tokens that matched, within
>> that excerpt).
>>
>> Mike
>>
>> On Mon, May 25, 2009 at 5:20 AM, KK <dioxide.software@gmail.com> wrote:
>> > Thanks for your response @Seid.
>> > Can any Lucene user give me directions on this regard? I'm stuck.
>> >  Really appreciate your help.
>> >
>> > Thanks,
>> > KK
>> >
>> > On Mon, May 25, 2009 at 2:43 PM, Seid Muhie <seidymam@gmail.com> wrote:
>> >
>> >> actually I used the normal java standard libraries for this work. I
>> >> used lucene only to retrieve the relevant document.
>> >> what you will do is, thought it is to manuall, as i don't know the way
>> >> it can be done by the Lucene API, you just record the location of the
>> >> query terms in the document (it is as easy as indexOf(query terms)).
>> >> But you have to be very aware of the speed of the system. then you can
>> >> go ahead or back, as you want.
>> >>
>> >> Once again, I think there will be also a LUcene workaround, that I am
>> >> not aware of it at all.
>> >>
>> >> Seid M
>> >>
>> >> On 5/25/09, KK <dioxide.software@gmail.com> wrote:
>> >> > Thanks for your quick response, Seid.
>> >> >
>> >> > There is one more mail I found in the archive[3/4 days old] where
>> someone
>> >> > asked about extracting 3 neighbors words around the match. I think
>> once
>> >> you
>> >> > have the position of matching term/phrase then extracting 3 or 30
>> >> neighbors
>> >> > wont be different, right? because you just have to move back/forward
>> and
>> >> get
>> >> > the words, this sounds logically simple but I dont know how simple
is
>> >> this
>> >> > implementation-wise.
>> >> > Also people are talking about someting called spanQueries/termvectors
>> etc
>> >> to
>> >> > use for this purpose. I'm still to get the exact idea of how to do
>> this.
>> >> > As per your mail, you used Java to extract the neighbors, Is that
>> using
>> >> the
>> >> > standard techniques i.e using those spanqueries/termvectors or
>> something
>> >> > else.
>> >> > If you can elaborate all this a bit It'd be very helpful.
>> >> >
>> >> > Thank you.
>> >> > KK>
>> >> >
>> >> > On Mon, May 25, 2009 at 10:51 AM, Seid Muhie <seidymam@gmail.com>
>> wrote:
>> >> >
>> >> >> for my thesis work (Question Answering) I used to retrieve first
the
>> >> >> document and then play with java to extract the needed answer.
>> >> >> for your case what you will do is first locate the positions of
the
>> >> >> query terms in the document (in this case it might be distributed
>> >> >> throughout the document - hence difficult to get the 15/20 words)
>> then
>> >> >> count something10 words forward and backward and extract the match.
>> >> >> this is the way I handle my problem. Hope there might be different
I
>> >> >> dea too
>> >> >>
>> >> >> Seid M.
>> >> >>
>> >> >> On 5/25/09, KK <dioxide.software@gmail.com> wrote:
>> >> >> > Hi All,
>> >> >> > I'm trying to index some non-english web pages and I'm keeping
all
>> the
>> >> >> > content of the page in a single field and the searches are
working
>> >> fine
>> >> >> as
>> >> >> > well. Now when I search for some query it gives the complete
page,
>> >> which
>> >> >> is
>> >> >> > expected. Now I want to restrict the showing of results to
say 20
>> >> words
>> >> >> > around the match, something like google does, otherwise we
cann't
>> make
>> >> >> users
>> >> >> > to look for a match in the whole page content[I'll use highlighter
>> >> after
>> >> >> > this is done]. So getting positions of the matched word/phrase
>> might
>> >> >> > help
>> >> >> so
>> >> >> > that I can extract some words before and some words after
that and
>> >> will
>> >> >> show
>> >> >> > that to end user. Any idea on doing the same will be very
helpful.
>> >> Thank
>> >> >> > you.
>> >> >> >
>> >> >> > KK.
>> >> >> >
>> >> >>
>> >> >>
>> >> >> --
>> >> >> "RABI ZIDNI ILMA"
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>
>> >> >>
>> >> >
>> >>
>> >>
>> >> --
>> >> "RABI ZIDNI ILMA"
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message