lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Using Lucene to match document sets to each other
Date Fri, 16 Dec 2011 20:04:53 GMT
Have you looked at Lucene's "MoreLikeThis"? I confess I haven't
worked with this enough to recommend *how* to use it, but it seems
like it's in the general area you're talking about.


On Fri, Dec 16, 2011 at 12:53 PM, Josh Stone <> wrote:
> Thanks for the response Donna. That would make more sense, but the items
> I'm pulling in from the web contain large bodies of text (descriptions)
> whereas the products in my catalog consist of shorter fields such as
> product name, manufacturer, product code, etc. So using the smaller fields
> from my catalog to build queries against the larger fields in the items I
> pull in seems to be the only way to do things (that I can think of).
> And this brings up my exact problem. I have a document (set of fields) that
> I want to use as search criteria for a search against another set of
> documents. Can something like this be done?
> Cheers,
> Josh
> On Fri, Dec 16, 2011 at 5:02 AM, Donna L Gresh <> wrote:
>> Maybe I'm misunderstanding what you're trying to do, but why not do it the
>> other
>> way around; that is, index the items in your catalog, and use the items on
>> the web
>> as the query into the catalog. I have an analogous process (though
>> completely
>> different application area) and I index the stuff that doesn't change
>> much, and use the
>> things that are constantly changing as the query.
>> Donna L. Gresh
>> Business Analytics and Mathematical Sciences
>> IBM T.J. Watson Research Center
>> (914) 945-2472
>> From:
>> Josh Stone <>
>> To:
>> Date:
>> 12/15/2011 04:57 PM
>> Subject:
>> Using Lucene to match document sets to each other
>> I have a use case for which I'm trying to figure out the best way to use
>> Lucene and could use some guidance.
>> I have a set of documents representing products in a catalog (name,
>> description, etc.). I then pull down data from different sources such as
>> Ebay and Amazon and need to determine if the items retrieved from those
>> sources match any of the products in the catalog. So I'm essentially
>> attempting to take many items and many products and determine where I have
>> matches.
>> I'm not sure the best way to go about this, but one questionable approach
>> is to index the items as I pull them in (to RAM) and do one search for
>> every product in my catalog, looking for matching names or descriptions.
>> This means an almost exponential number of queries though. Is there a
>> better approach? Any help is appreciated.
>> Thanks,
>> Josh

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message