lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Can I use Lucene to solve this problem?
Date Wed, 17 Aug 2011 16:39:35 GMT
Certainly sounds doable in lucene.  Is it basically working apart from
false positives?  Can you give some examples of the false positives?

I'd be tempted to look at span queries which will let you say that
"Yesterday I put on my green plaid shirt" is a better match against
"Green plaid shirt with stripes" than "a plaid shirt that is green"
would.  If that is what you want. See for
good info on span queries.

As for misspellings, that is a separate issue.  Google lucene
spellcheck.  Or look at synonyms if you've got a list of alternatives.


On Wed, Aug 17, 2011 at 4:03 AM, Josh Rehman <> wrote:
> My organization is looking to solve a difficult problem, and I believe that
> Lucene is a close fit (although perhaps it is not). However I'm not sure
> exactly how to approach this problem.
> The problem is this: given a small set of fixed noun phrases and a much
> larger set of human generated short sentences, determine whether the
> sentences refer to those noun phrases. For example, perhaps I have these
> noun phrases:
>   1. Bright yellow book
>   2. Large bulbous balloon
>   3. Green plaid shirt with stripes
>   4. Dark yellow book
> And these sentences:
>   1. Yesterday I put on my green plaid shirt.
>   2. Next week I'll sell my balloon.
>   3. Just finished my bright book.
>   4. Wondering at how lovely my baloon is [Note the misspelling]
> Given that list of sentences, I will generate (sentence, noun phrase)
> ordered pairs like this:
> 1,3
> 2,2
> 3,1
> 4,2
> Or even an ordered pair of (sentence, [noun phrases]). E.g. 3,[1,4] (because
> there might be an ambiguous reference to "Book")
> The "shape" of this problem looks a lot like what Lucene does, but frankly I
> don't have a lot of experience with textual indexing and search. I've
> installed Lucene and managed to index and search my data structures, however
> with the StandardIndexer I'm getting a lot of false positives.
> Here is the code I have so far (I've elided the parsing code which is not
> very interesting):
> Really appreciate any and all guidance. Thanks.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message