lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rehman <j...@joshrehman.com>
Subject Can I use Lucene to solve this problem?
Date Wed, 17 Aug 2011 03:03:24 GMT
My organization is looking to solve a difficult problem, and I believe that
Lucene is a close fit (although perhaps it is not). However I'm not sure
exactly how to approach this problem.

The problem is this: given a small set of fixed noun phrases and a much
larger set of human generated short sentences, determine whether the
sentences refer to those noun phrases. For example, perhaps I have these
noun phrases:

   1. Bright yellow book
   2. Large bulbous balloon
   3. Green plaid shirt with stripes
   4. Dark yellow book

And these sentences:

   1. Yesterday I put on my green plaid shirt.
   2. Next week I'll sell my balloon.
   3. Just finished my bright book.
   4. Wondering at how lovely my baloon is [Note the misspelling]

Given that list of sentences, I will generate (sentence, noun phrase)
ordered pairs like this:
1,3
2,2
3,1
4,2

Or even an ordered pair of (sentence, [noun phrases]). E.g. 3,[1,4] (because
there might be an ambiguous reference to "Book")

The "shape" of this problem looks a lot like what Lucene does, but frankly I
don't have a lot of experience with textual indexing and search. I've
installed Lucene and managed to index and search my data structures, however
with the StandardIndexer I'm getting a lot of false positives.

Here is the code I have so far (I've elided the parsing code which is not
very interesting):
  https://gist.github.com/1150723

Really appreciate any and all guidance. Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message