lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Laflamme" <plafla...@konova.com>
Subject RE: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Fri, 14 Nov 2003 20:14:00 GMT
> >> Rules of linguistics? Is there such a thing? :)
> >
> > Actually, yes there is. Natural Language Processing (NLP) is a very
> > broad
> > research subject but a lot has come out of it.
>
> A lot of what? "If" statements? :)

Yes... just like every software boils down to branching and while loops for
the processor... ;o)

> I would agree with that. But it's easier said than done.

Yes, of course this is very complex. That's why NLP is a very popular field
of research: it's challenging!

> And the result are never, er, clear cut.

You're correct, results are not 100% perfect. But getting 95% is pretty
impressive when you're dealing with a computer software. Don't forget, even
with many years (decades even) of experience with our own language, we
humans still manage to misunderstand certain sentences... can you really
expect a software to be 100% correct all the time?

> Sure. But my take on this, is that pigs will fly before NLP turns into
> a predictable "science" :)

Maybe you're right, technologies derived from NLP may never be perfect. But
it doesn't make them useless. Quite the contrary I think.

I'm not a Lucene expert, but I'm sure it could benefit from using derived
NLP methods for text analysis. Maybe someone out there has some experience
they might want to share with us?

Thanks,
Phil

> -----Original Message-----
> From: petite_abeille [mailto:petite_abeille@mac.com]
> Sent: November 14, 2003 14:36
> To: Lucene Users List
> Subject: Re: inter-term correlation [was Re: Vector Space Model in
> Lucene?]
>
>
>
> On Nov 14, 2003, at 20:29, Philippe Laflamme wrote:
>
> >> Rules of linguistics? Is there such a thing? :)
> >
> > Actually, yes there is. Natural Language Processing (NLP) is a very
> > broad
> > research subject but a lot has come out of it.
>
> A lot of what? "If" statements? :)
>
> > More specifically, Rule-based taggers have become very popular since
> > Eric
> > Brill published his works on trainable rule-based tagging.
> >
> > Essentially, it comes to down analysing sentences to determine the role
> > (noun, verb, etc.) of each words. It's very helpful to extract
> > noun-phrases
> > such has "cardiovascular disease" or "magnetic resonance imaging" from
> > documents.
>
> I would agree with that. But it's easier said than done. And the result
> are never, er, clear cut.
>
> > So, yep... you can definitely derive rules to analyse natural
> > language...
>
> Well... beyond the jargon and the impressive math... this all boils
> down to fuzzy heuristics and judgment calls... but perhaps this is just
> me :)
>
> > I'm sure you already know about all of this...
>
> Not really. I'm more of a dilettante than a "NLP expert".
>
> > just thought it might be
> > interesting for some...
>
> Sure. But my take on this, is that pigs will fly before NLP turns into
> a predictable "science" :)
>
> PA.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message