Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 94753 invoked from network); 13 Feb 2002 03:07:53 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 13 Feb 2002 03:07:53 -0000 Received: (qmail 20859 invoked by uid 97); 13 Feb 2002 03:08:00 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 20829 invoked by uid 97); 13 Feb 2002 03:07:59 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 20337 invoked from network); 13 Feb 2002 02:26:02 -0000 Date: Tue, 12 Feb 2002 21:27:02 -0500 From: Andrew Libby To: Lucene Users List Subject: Re: search similar docs? Message-ID: <20020212212702.F29178@commnav.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from dcalvo@ig.com.br on Tue, Feb 12, 2002 at 05:24:45PM -0300 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Tue, Feb 12, 2002 at 05:24:45PM -0300, Daniel Calvo wrote: > Hi, > > I was thinking of implementing a search for similar documents (like some commercial search engines do) and wondering if anyone has > already done something like that with Lucene. I thought of collecting all terms of the selected document (or maybe some subset of > them) and then creating a MultiTermQuery containing those terms. Does it make sense? Is there a better way to achieve this? I'd think it would be hard to gather a list of meaningful terms from the current hit that are meaningful to the user. It would seem that an alias expansion on the origional searh experssion, or possibly even a collection of terms (of the most common terms in the document we're looking for documents like) after going through a stop word analyzer or something. I've not implmented anything like this. Just a few thoughts. Andy -- -------------------------------------------------- Andrew Libby CommNav, Inc alibby@commnav.com -- To unsubscribe, e-mail: For additional commands, e-mail: