lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Aristov <alexander.aris...@gmail.com>
Subject Re: Can I use Lucene to solve this problem?
Date Wed, 17 Aug 2011 18:32:32 GMT
Hi

Look at the apache mohaut project (based on hadoop ). It seems you need
machine learning algorithms.

Best Regards
Alexander Aristov


On 17 August 2011 20:39, Ian Lea <ian.lea@gmail.com> wrote:

> Certainly sounds doable in lucene.  Is it basically working apart from
> false positives?  Can you give some examples of the false positives?
>
> I'd be tempted to look at span queries which will let you say that
> "Yesterday I put on my green plaid shirt" is a better match against
> "Green plaid shirt with stripes" than "a plaid shirt that is green"
> would.  If that is what you want. See
> http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ for
> good info on span queries.
>
> As for misspellings, that is a separate issue.  Google lucene
> spellcheck.  Or look at synonyms if you've got a list of alternatives.
>
>
> --
> Ian.
>
>
> On Wed, Aug 17, 2011 at 4:03 AM, Josh Rehman <josh@joshrehman.com> wrote:
> > My organization is looking to solve a difficult problem, and I believe
> that
> > Lucene is a close fit (although perhaps it is not). However I'm not sure
> > exactly how to approach this problem.
> >
> > The problem is this: given a small set of fixed noun phrases and a much
> > larger set of human generated short sentences, determine whether the
> > sentences refer to those noun phrases. For example, perhaps I have these
> > noun phrases:
> >
> >   1. Bright yellow book
> >   2. Large bulbous balloon
> >   3. Green plaid shirt with stripes
> >   4. Dark yellow book
> >
> > And these sentences:
> >
> >   1. Yesterday I put on my green plaid shirt.
> >   2. Next week I'll sell my balloon.
> >   3. Just finished my bright book.
> >   4. Wondering at how lovely my baloon is [Note the misspelling]
> >
> > Given that list of sentences, I will generate (sentence, noun phrase)
> > ordered pairs like this:
> > 1,3
> > 2,2
> > 3,1
> > 4,2
> >
> > Or even an ordered pair of (sentence, [noun phrases]). E.g. 3,[1,4]
> (because
> > there might be an ambiguous reference to "Book")
> >
> > The "shape" of this problem looks a lot like what Lucene does, but
> frankly I
> > don't have a lot of experience with textual indexing and search. I've
> > installed Lucene and managed to index and search my data structures,
> however
> > with the StandardIndexer I'm getting a lot of false positives.
> >
> > Here is the code I have so far (I've elided the parsing code which is not
> > very interesting):
> >  https://gist.github.com/1150723
> >
> > Really appreciate any and all guidance. Thanks.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message