lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: md5 keyword field issue
Date Mon, 20 Jun 2005 12:55:36 GMT

On Jun 19, 2005, at 5:17 AM, catalin-lucene@dazoot.ro wrote:

> Hi there,
>
> i have an index with the following infos in it:
> url - keyword - Field("url", this.url, Field.Store.YES,  
> Field.Index.UN_TOKENIZED);
> md5 - keyword - Field("md5", this.url, Field.Store.YES,  
> Field.Index.UN_TOKENIZED);
> alt - Field("alt", this.alt, Field.Store.YES, Field.Index.TOKENIZED);
>
> i use it to index my images.
> now it happens that the same image (eg: same md5) is used in different
> locations (eg: different urls).
> filename: mylogo.gif used in
> http://site.com/project1/mylogo.gif and also
> http://site.com/project2/some_other_bubu/mylogo.gif
>
> the ALT is different (eg: different text)
>
> now on my image search app when i search mylogo i get "several"
> results with the same image.
>
> i would like to reduce the nr of results in that way that the md5 is
> unique.
> Note: i can't delete from the index the 2nd image cause the ALT might
> be different, so in general all the properties put together (md5, url,
> alt) compose a different "entity".


It seems you have conflicting goals here.  You want (md5, url, alt)  
to be unique in one sense, yet you want md5 itself to be unique in  
another sense.

> i bought "Lucene in Action" book, which is a GREAT book.

Thank you!  :)

> i was looking into "filters".
>
> i quote: "If all the information needed to perform filtering is in the
> index, there is no need to write your own filter because QueryFilter
> can handle it."
>
> i can't seem to figure it out, how query filter can help me.
>
> also tried to write my own filter but not that much info on that
> direction either.

Filters reduce the search space to a subset of the documents in the  
index.  Which document would you want returned when there are  
multiple documents in the index with the same MD5?  Or do you want to  
cluster them by MD5?

Do you want to cluster them by MD5 perhaps, but still return multiple  
documents back from a search?

I'm not sure if a Filter is the appropriate technique for this  
scenario or not.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message