Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 96728 invoked from network); 14 Nov 2003 19:54:40 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 14 Nov 2003 19:54:40 -0000 Received: (qmail 74058 invoked by uid 500); 14 Nov 2003 19:54:26 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 73972 invoked by uid 500); 14 Nov 2003 19:54:25 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 73959 invoked from network); 14 Nov 2003 19:54:25 -0000 Received: from unknown (HELO mh1pdmz3a.bloomberg.com) (208.22.56.37) by daedalus.apache.org with SMTP; 14 Nov 2003 19:54:25 -0000 Received: from ns2.bloomberg.com (ns2.bloomberg.com [160.43.8.240]) by mh1pdmz3a.bloomberg.com with ESMTP for lucene-user@jakarta.apache.org; Fri, 14 Nov 2003 14:54:30 -0500 Received: from ny2528.corp.bloomberg.com (ny2528.bloomberg.com [172.20.73.29]) by ns2.bloomberg.com (8.11.7p1+Sun/8.10.2) with ESMTP id hAEJsTd16322 for ; Fri, 14 Nov 2003 14:54:29 -0500 (EST) content-class: urn:content-classes:message Subject: RE: Vector Space Model in Lucene? MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable Date: Fri, 14 Nov 2003 14:54:28 -0500 Message-Id: <33D5BBBB077CAD47AA4F225359F4A5E401241190@ny2528.corp.bloomberg.com> X-MimeOLE: Produced By Microsoft Exchange V6.0.6487.1 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Vector Space Model in Lucene? Thread-Index: AcOq6FRGnXYr9R5dSainwlAcT2ry/QAAF15g From: "Chong, Herb" To: "Lucene Users List" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N it solves one part of the problem, but there are a lot of sentences in a = typical document. you'll need to composite a rank of a document from its = constituent sentences then. there are less drastic ways to solve the = problem. the other problem is that Lucene doesn't consider the term = order in the query unless the query is formulated as a phrase. a simple = bag-of-words query doesn't make use of the ordering of terms that apply = in a given language. Herb.... -----Original Message----- From: Erik Hatcher [mailto:erik@ehatchersolutions.com] Sent: Friday, November 14, 2003 2:49 PM To: Lucene Users List Subject: Re: Vector Space Model in Lucene? In the Lucene-sense of things, sounds like you're after one Document=20 per sentence. You then get your boundaries automatically as well as=20 the "distance weighting" through the coord() Similarity function. At=20 least that seems like a close approximation of what Lucene offers. =20 Thoughts? Erik --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org