lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Laflamme" <plafla...@konova.com>
Subject RE: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Fri, 14 Nov 2003 19:29:53 GMT
> Rules of linguistics? Is there such a thing? :)

Actually, yes there is. Natural Language Processing (NLP) is a very broad
research subject but a lot has come out of it.

More specifically, Rule-based taggers have become very popular since Eric
Brill published his works on trainable rule-based tagging.

Essentially, it comes to down analysing sentences to determine the role
(noun, verb, etc.) of each words. It's very helpful to extract noun-phrases
such has "cardiovascular disease" or "magnetic resonance imaging" from
documents.

So, yep... you can definitely derive rules to analyse natural language...

I'm sure you already know about all of this... just thought it might be
interesting for some...

Phil

> -----Original Message-----
> From: petite_abeille [mailto:petite_abeille@mac.com]
> Sent: November 14, 2003 14:04
> To: Lucene Users List
> Subject: Re: inter-term correlation [was Re: Vector Space Model in
> Lucene?]
>
>
>
> On Nov 14, 2003, at 19:50, Chong, Herb wrote:
>
> > if you are handling inter correlation properly, then terms can't cross
> > sentence boundaries.
>
> Could you not break down your document along sentences boundary? If you
> manage to figure out what a sentence is, that is.
>
> > if you are not paying attention to sentence boundaries, then you are
> > not following rules of linguistics.
>
> Rules of linguistics? Is there such a thing? :)
>
> PA.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message