lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: Custom Search algorithm integration
Date Wed, 10 Oct 2007 16:18:13 GMT

On Oct 10, 2007, at 9:51 AM, pp1907@yahoo.com wrote:

> Yes, image field is binary. However, image size is not more than  
> 1KB. We would use a combination of text search to narrow down the  
> results. But once we get the results, we would want to search all  
> images (within the above resultset) with the user given image file.
>
>   Ideally, your 2nd approach - of scoring/matching within Lucene is  
> what we were looking for. Rather, we already have the capability to  
> match & score if 2 images are provided. It would be best if we  
> could integrate our Java API within Lucene search/score.
>
>   Lucene allows to store binary value as a field. However, am not  
> sure (as we are new to Lucene) how we can tell Lucene to use our  
> API to search/match for the binary/image field.
>
>   would be good to hear your thoughts.thank you

Not sure what more I can add.  I would try, as a proof of concept,  
using the payload mechanism (approach 1) and implement your algorithm  
in the scorePayload() method of Similarity.  Otherwise, if you want  
the 2nd approach, you will need to get into the Lucene code and  
figure out how to implement a Query/Scorer/Weight mechanism that  
works with binary fields.  This isn't well thought out at this point  
by me, it just seems like it should be possible.  See the Scoring  
section of the website for more info on how to create Querys, etc.   
The payload stuff may or may not serve as an example here.  If you  
have specific questions on the API, feel free to come back with more  
questions.  You may also try searching the java-user and java-dev  
archives for items related to binary fields.


>
>   cheers
>   Prakash
>
> Grant Ingersoll <gsingers@apache.org> wrote:
>     I presume that the image field is a binary field, right?
>
> I can think of a couple of things that _may_ work for you:
>
> 1. You create a dummy token on the "image" field, and then store the
> image data as a payload on that token. Then you can use the payload
> mechanism to score the field by overriding the scorePayload (I think
> that's the name) method in the Similarity class. You may have to
> implement a new Query for this similar to BoostingTermQuery (but
> maybe not)
>
> 2. You could implement scoring of binary fields in Lucene. I've had
> in my mind that a binary field is the same as a payload, more or
> less, so it seems like it would be relatively easy to create a query
> mechanism for it, just not sure of the loading part of it at this
> time (but that should be easy enough to figure out)
>
> I don't know how the performance of either of these approaches would
> be. If you have large images, it could be a significant overhead.
> The other option, of course, is to do postprocessing on the results
> by first narrowing down on the text fields and then scoring the  
> images.
>
> -Grant
>
> On Oct 6, 2007, at 1:03 AM, pp1907@yahoo.com wrote:
>
>> Thank you Grant but FuntionQuery doesnt solve the
>> problem. Looks like I didnt explain the problem
>> properly. Let me try it again:
>>
>> 1. We have a document in which there are 4 text fields
>> and a image field.
>> 2. We have a special algorithm to perform match based
>> on the image. In the sense, if 2 images are provided,
>> the algorithm can determine with a match result and a
>> score (which indicates likelihood of match)
>>
>> We want to know how we can integrate the "image
>> matching" algorithm into Lucene/Solr.
>>
>> The search will happen through API only where the
>> image is passed and matched images from the index
>> needs to be given back as result.
>>
>> Wanted to understand if we could achieve the
>> integration of this with Lucene/Solr so that both
>> search on text fields and images can happen.
>>
>> Would be nice to hear your thoughts.
>>
>> cheers
>> Prakash
>>
>> Grant Ingersoll wrote:
>> You may want to look at the FunctionQuery capability,
>> either in
>> Lucene, or the expanded capabilities (recently added)
>> in Solr.
>>
>> -Grant
>>
>> On Oct 4, 2007, at 2:39 PM, pp1907@yahoo.com wrote:
>>
>>> Hi,
>>>
>>> Were planning to use Lucene or Solr within our
>>> application and wanted to know if it can support the
>>> following:
>>>
>>> Scenario:
>>> We have (say) 5 fields in a document which need to
>> be
>>> indexed. 4 fields are indexed by Lucene. The 5th
>> field
>>> is not indexed as it has data that cant be indexed
>> or
>>> searched by Lucene.
>>>
>>> We have a special algorithm/API to search/match on
>> the
>>> 5th field. Can this algorithm/API be integrated
>> within
>>> Lucene/Solr so that if we pass the 5th field data as
>> a
>>> query, the search engine will use our algorithm/API
>> to
>>> search and return/display the results. The reason
>> why
>>> we want this to be integrated with Lucene/Solr are
>>> obvious - rely on Lucene/Solr's basic scalability /
>>> performance as well as integrate both traditional
>>> searching capabilities and specialized searching for
>>> our end-user
>>>
>>> Is this possible to achieve in Lucene/Solr and if
>> so,
>>> how. Would be nice to understand your thoughts as
>> this
>>> is very critical for us.
>>>
>>> Appreciate your help. thank you.
>>>
>>> cheers
>>> Prakash
>>>
>>>
>>>
>>>
>>>
>> _____________________________________________________________________ 
>> _
>>
>>> ______________
>>> Yahoo! oneSearch: Finally, mobile search
>>> that gives answers, not web links.
>>>
>> http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>> java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://lucene.grantingersoll.com
>>
>> Lucene Boot Camp Training:
>> ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://
>>
>> www.apachecon.com
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>>
>> _____________________________________________________________________ 
>> _
>> ______________
>> Be a better Heartthrob. Get better relationship answers from
>> someone who knows. Yahoo! Answers - Check it out.
>> http://answers.yahoo.com/dir/?link=list&sid=396545433
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Boot Camp Training:
> ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://
> www.apachecon.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------
>  Check out  the hottest 2008 models today at Yahoo! Autos.

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message