lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Color search
Date Fri, 28 Sep 2007 13:31:58 GMT
Another option would be to extend Solr (and donate back) to  
incorporate Lucene's payload functionality, in which case you could  
associate the percentile of the color as a payload and use the  
BoostingTermQuery... :-)  If you're interested in this, a discussion  
on solr-dev is probably warranted to figure out the best way to do this.


On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote:

> If it were just a couple of colors, you could have a separate field
> for each color and then index the percent in that field.
> black:70
> grey:20
> and then you could use a function query to influence the score (or you
> could sort by the color percent).
> However, this doesn't scale well to a large index with a large  
> number of colors.
> Each field used like that will take up 4 bytes per document in the  
> index.
> so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes =  
> 400MB
> Doable depending on your index size (use "int" or "float" and not
> "sint" or "sfloat" type for this... it will be better on the memory).
> If you needed to be better on the memory, you could encode all of the
> colors into a single value (perhaps into a compact string... one
> percentile per byte or something) and then have a custom function that
> extracts the value for a particular color.  (this involves some java
> development)
> -Yonik
> On 9/28/07, Guangwei Yuan <> wrote:
>> Hi,
>> We're running an e-commerce site that provides product search.  
>> We've been
>> able to extract colors from product images, and we think it'd be  
>> cool and
>> useful to search products by color. A product image can have up to  
>> 5 colors
>> (from a color space of about 100 colors), so we can implement it  
>> easily with
>> Solr's facet search (thanks all who've developed Solr).
>> The problem arises when we try to sort the results by the color  
>> relevancy.
>> What's different from a normal facet search is that colors are  
>> weighted. For
>> example, a black dress can have 70% of black, 20% of gray, 10% of  
>> brown. A
>> search query "color:black" should return results in which the  
>> black dress
>> ranks higher than other products with less percentage of black.
>> My question is: how to configure and index the color field so that  
>> products
>> with higher percentage of color X ranks higher for query "color:X"?
>> Thanks for your help!
>> - Guangwei

Grant Ingersoll

Lucene Helpful Hints:

View raw message