lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "N. Hira" <nh...@cognocys.com>
Subject Re: Similarity percentage between two Strings
Date Wed, 03 Sep 2008 23:30:46 GMT
More details may change my opinion (not quite sure how others feel  
yet), but with the way you've described it so far, it seems like all  
you need is a basic string matcher:

For every message:
	- if <blah>message.subject<blah> is found in the pool, then this  
message is "similar to" the message in the pool
	- if no such message could be found, then this message add it to the  
pool and wait to find more like it

Please describe the problem in greater detail if this seems too  
simplistic.

-h

On 03-Sep-2008, at 5:58 PM, Thiago Moreira wrote:

>
>     Well, the similar definition that I'm looking for is the number  
> 2, maybe the number 3, but to start the number 2 is enough. If you  
> guys think that is not a Lucene problem what else tool can I use to  
> implement this requirement??
>
>     Thanks
> Thiago Moreira
> Software Engineer
> tmoreira@liferay.com
> Liferay, Inc.
> Enterprise. Open Source. For Life.
>
>
> N. Hira wrote:
>>
>> I don't know how much of this is a Lucene problem, but -- as I'm  
>> sure you will inevitably hear from others on the list -- it  
>> depends on what your definition of "similar" is.
>>
>> By similar, do you mean:
>> 1.  Identical, except for variations in case (upper/lower)
>> 2.  Allow 1., but also allow prefixes/suffixes (e.g., "FW:  " or  
>> "... (summary")
>> 3.  Allow 1., 2. and permit some new terms ... how many?
>> 4.  Allow all of the above and allow some changes to terms using  
>> stemming (E.g., "Google releases Chrome" is similar to "Google  
>> announces the release of its new Chrome web browser")
>> ....
>>
>> I'm sure you see where this is going.  So ... how do you define  
>> similar?
>>
>> Good luck!
>>
>> -h
>> --------------------------------------------------------------------- 
>> -
>> Hira, N.R.
>> Cognocys, Inc.
>>
>> On 03-Sep-2008, at 2:52 PM, Thiago Moreira wrote:
>>
>>>
>>>     Hey all,
>>>
>>>     I want to know how much two Strings are similar! The thing  
>>> is: I'm processing an email box and I want to group all messages  
>>> that have the subject similar, makes sense?? I looked on the  
>>> documentation but I didn't find how to accomplish this. It's not  
>>> necessary add the messages or the subjects on some kind of index.  
>>> I'm using 2.3.2 version of Lucene.
>>>
>>>     Anyone has some idea?
>>>
>>>     Thanks in advance.
>>> -- 
>>> Thiago Moreira
>>> Software Engineer
>>> tmoreira@liferay.com
>>> Liferay, Inc.
>>> Enterprise. Open Source. For Life.
>>>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message