lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From petite_abeille <petite_abei...@mac.com>
Subject Re: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Fri, 14 Nov 2003 20:44:41 GMT

On Nov 14, 2003, at 21:14, Philippe Laflamme wrote:

>>>> Rules of linguistics? Is there such a thing? :)
>>>
>>> Actually, yes there is. Natural Language Processing (NLP) is a very
>>> broad
>>> research subject but a lot has come out of it.
>>
>> A lot of what? "If" statements? :)
>
> Yes... just like every software boils down to branching and while 
> loops for
> the processor... ;o)

Hehe... ;) But NLP seems to suffer more from heuristics disguised in 
fancy jargon than other fields...

>
>> I would agree with that. But it's easier said than done.
>
> Yes, of course this is very complex. That's why NLP is a very popular 
> field
> of research: it's challenging!

Indeed.

>
>> And the result are never, er, clear cut.
>
> You're correct, results are not 100% perfect. But getting 95% is pretty
> impressive when you're dealing with a computer software. Don't forget, 
> even
> with many years (decades even) of experience with our own language, we
> humans still manage to misunderstand certain sentences... can you 
> really
> expect a software to be 100% correct all the time?

Nope. Therefore my "tongue in cheek" comments...

>
>> Sure. But my take on this, is that pigs will fly before NLP turns into
>> a predictable "science" :)
>
> Maybe you're right, technologies derived from NLP may never be 
> perfect. But
> it doesn't make them useless. Quite the contrary I think.

Perhaps. I'm not saying it's utterly useless as a whole. But... NLP has 
a noted tendency to over promise and under deliver. Plus, it's marred 
with too much jargons.... which is suspicious in and by itself :)

> I'm not a Lucene expert, but I'm sure it could benefit from using 
> derived
> NLP methods for text analysis.

For "hardcore" text analysis, perhaps. But Lucene is an low level 
indexing library. You can build something much more, er, esoteric on 
top of it. But I don't think that the core library would benefit from 
any "bizarre" additions. Plus, the core elements of the library provide 
already more than enough room to play with whatever scheme you may have 
in mind.

> Maybe someone out there has some experience
> they might want to share with us?

Perhaps. But one way or another, and as far as Lucene is concerned, you 
will be better off building something exotic on top of Lucene than 
messing around with its internals.

PA.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message