lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re[2]: md5 keyword field issue
Date Mon, 20 Jun 2005 13:38:27 GMT
Monday, June 20, 2005, 3:55:36 PM, Erik Hatcher wrote:
> Filters reduce the search space to a subset of the documents in the
> index.  Which document would you want returned when there are  
> multiple documents in the index with the same MD5?  Or do you want to
> cluster them by MD5?

i think cluster by md5 is more appropriate.

> Do you want to cluster them by MD5 perhaps, but still return multiple
> documents back from a search?

i want to return just the 1st image (the more relevant one). no use to
show duplicates in an image search app.

> I'm not sure if a Filter is the appropriate technique for this  
> scenario or not.

well, i am not sure either.
one solution would be when i iterate through the hits collection and
send them to the webapp, to group them by md5 or some.

is this a good way to do it ?
(the bad thing is i would have to do lots of hits.doc(index) in
advance, to make this group by md5 thing, and if the results are
paginated << which is the case >>, on the 2nd page i would need to
keep in session the last "index" or to recalculate it again.. - oh
nein !:)

in sql this would be:
select distinct md5, url, alt from table group by md5 order by score asc;

if i had the score in the DB (which is not the case).

Catalin Constantin

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message