lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donna L Gresh <gr...@us.ibm.com>
Subject Re: Using Lucene to match document sets to each other
Date Fri, 16 Dec 2011 13:02:57 GMT
Maybe I'm misunderstanding what you're trying to do, but why not do it the 
other 
way around; that is, index the items in your catalog, and use the items on 
the web 
as the query into the catalog. I have an analogous process (though 
completely
different application area) and I index the stuff that doesn't change 
much, and use the 
things that are constantly changing as the query.

Donna L. Gresh
Business Analytics and Mathematical Sciences 
IBM T.J. Watson Research Center
(914) 945-2472
https://researcher.ibm.com/researcher/view.php?person=us-gresh
gresh@us.ibm.com




From:
Josh Stone <pacesysjosh@gmail.com>
To:
java-user@lucene.apache.org
Date:
12/15/2011 04:57 PM
Subject:
Using Lucene to match document sets to each other



I have a use case for which I'm trying to figure out the best way to use
Lucene and could use some guidance.

I have a set of documents representing products in a catalog (name,
description, etc.). I then pull down data from different sources such as
Ebay and Amazon and need to determine if the items retrieved from those
sources match any of the products in the catalog. So I'm essentially
attempting to take many items and many products and determine where I have
matches.

I'm not sure the best way to go about this, but one questionable approach
is to index the items as I pull them in (to RAM) and do one search for
every product in my catalog, looking for matching names or descriptions.
This means an almost exponential number of queries though. Is there a
better approach? Any help is appreciated.

Thanks,
Josh



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message