lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Obtaining the contexts of hits
Date Wed, 09 Mar 2005 21:42:17 GMT
Since you own Lucene in Action, look at this:

See section 8.7, Highlighting query terms.


--- Andy Roberts <> wrote:
> Hi,
> I've been using Lucene for a few months now, although not in a
> typical 
> "building a search engine" kind of way*. Basically, I have some large
> documents. I would like a system whereby I search for a term, and
> then I 
> receive a hit for each match, with its context, e.g., ten words
> either side 
> of the match.
> I've been looking through the API, and there's a SpanNearQuery (or
> something 
> similar) which looked at first glance to be what I want, but after
> second 
> thoughts, it appears it's searching for a set of words within a given
> proximity. 
> I know Lucene stores TermPositions: I know how to get the position of
> a match. 
> Is there a way of doing a reverse lookup, i.e., given a position,
> return the 
> term. Because if that were the case, I could easily build a context
> up by 
> finding pos, then looping to get all terms +- a specified window.
> Either way, I can't see a way forward. Simply being able to find
> which 
> documents the terms are in isn't very helpful, because let's say I
> find the 
> docs that match, I then have to open up each one, tokenise and
> re-search for 
> my term,etc, etc. All this info is there, somewhere in the index, I
> just want 
> to get to it so that I can benefit from the many speed benefits of
> Lucene.
> I don't expect sample code or anything, just a pointer to the right
> direction. 
> I do own the LIA book, but haven't read it all yet - so if there's
> anything 
> in there which could be relevant, please let me know :)
> Much help appreciated,
> Andrew Roberts
> * for those who are interested, I'm a computer scientist doing
> research which 
> is basically a cross between computational linguistics and machine
> learning. 
> I work with large text corpora, gather information about how words
> behave 
> relative to each and try to infer word class and grammatical
> structure from 
> it. I'm using Lucene to try and speed up text processing overheads,
> by having 
> my corpora indexed.
> -- 
> Computer Vision and Language Reseearch Group
> School of Computing
> Univeristy of Leeds
> Leeds, UK
> LS2 9JT
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message