lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Using Lucene to match document sets to each other
Date Fri, 16 Dec 2011 20:04:53 GMT
Have you looked at Lucene's "MoreLikeThis"? I confess I haven't
worked with this enough to recommend *how* to use it, but it seems
like it's in the general area you're talking about.

http://lucene.apache.org/java/3_5_0/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Best
Erick

On Fri, Dec 16, 2011 at 12:53 PM, Josh Stone <pacesysjosh@gmail.com> wrote:
> Thanks for the response Donna. That would make more sense, but the items
> I'm pulling in from the web contain large bodies of text (descriptions)
> whereas the products in my catalog consist of shorter fields such as
> product name, manufacturer, product code, etc. So using the smaller fields
> from my catalog to build queries against the larger fields in the items I
> pull in seems to be the only way to do things (that I can think of).
>
> And this brings up my exact problem. I have a document (set of fields) that
> I want to use as search criteria for a search against another set of
> documents. Can something like this be done?
>
> Cheers,
> Josh
>
> On Fri, Dec 16, 2011 at 5:02 AM, Donna L Gresh <gresh@us.ibm.com> wrote:
>
>> Maybe I'm misunderstanding what you're trying to do, but why not do it the
>> other
>> way around; that is, index the items in your catalog, and use the items on
>> the web
>> as the query into the catalog. I have an analogous process (though
>> completely
>> different application area) and I index the stuff that doesn't change
>> much, and use the
>> things that are constantly changing as the query.
>>
>> Donna L. Gresh
>> Business Analytics and Mathematical Sciences
>> IBM T.J. Watson Research Center
>> (914) 945-2472
>> https://researcher.ibm.com/researcher/view.php?person=us-gresh
>> gresh@us.ibm.com
>>
>>
>>
>>
>> From:
>> Josh Stone <pacesysjosh@gmail.com>
>> To:
>> java-user@lucene.apache.org
>> Date:
>> 12/15/2011 04:57 PM
>> Subject:
>> Using Lucene to match document sets to each other
>>
>>
>>
>> I have a use case for which I'm trying to figure out the best way to use
>> Lucene and could use some guidance.
>>
>> I have a set of documents representing products in a catalog (name,
>> description, etc.). I then pull down data from different sources such as
>> Ebay and Amazon and need to determine if the items retrieved from those
>> sources match any of the products in the catalog. So I'm essentially
>> attempting to take many items and many products and determine where I have
>> matches.
>>
>> I'm not sure the best way to go about this, but one questionable approach
>> is to index the items as I pull them in (to RAM) and do one search for
>> every product in my catalog, looking for matching names or descriptions.
>> This means an almost exponential number of queries though. Is there a
>> better approach? Any help is appreciated.
>>
>> Thanks,
>> Josh
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message