lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re[4]: md5 keyword field issue
Date Mon, 20 Jun 2005 14:54:05 GMT
Monday, June 20, 2005, 5:48:30 PM, Erik Hatcher wrote:
> Now you've just said the same conflicting thing a different way.  You
> want to cluster but only return one.  :)

i think i missunderstood here the Term: cluster.
so yes, i just want one image returned.

> If you only want one image returned, then it seems that only indexing
> the same image once is the way to go.  When you find a duplicate MD5,
> don't index that as a second document.  You will, instead, update the
> document by adding additional ALT text and perhaps the additional URL.

this sounds pretty ok !

> Is there a reason why indexing each unique image (by MD5) is not a  
> good way to go in your case?

>> in sql this would be:
>> select distinct md5, url, alt from table group by md5 order by  
>> score asc;

> This would give you multiple records for the same MD5.  You said  
> above you only want one per MD5.

here i'm afraid you are not correct, because i have GROUP BY MD5
clause which will return no duplicates.

(tested it on mysql)
for the query above.
170 rows in set (0.13 sec)

select distinct md5 from image;
| e127d0e91af5d8b2522138fb46c2e1bc |
| 7a18b029925d8357599878a85fd6b02f |
170 rows in set (0.00 sec)

same nr of rows :D

Catalin Constantin

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message