Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
content-class: urn:content-classes:message
Subject: RE: Vector Space Model in Lucene?
MIME-Version: 1.0
Content-Type: text/plain;
	charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
Date: Fri, 14 Nov 2003 14:54:28 -0500
Message-Id: 
 <33D5BBBB077CAD47AA4F225359F4A5E401241190@ny2528.corp.bloomberg.com>
Thread-Topic: Vector Space Model in Lucene?
Thread-Index: AcOq6FRGnXYr9R5dSainwlAcT2ry/QAAF15g
From: "Chong, Herb" <HChong3@bloomberg.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>

it solves one part of the problem, but there are a lot of sentences in a =
typical document. you'll need to composite a rank of a document from its =
constituent sentences then. there are less drastic ways to solve the =
problem. the other problem is that Lucene doesn't consider the term =
order in the query unless the query is formulated as a phrase. a simple =
bag-of-words query doesn't make use of the ordering of terms that apply =
in a given language.

Herb....

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Friday, November 14, 2003 2:49 PM
To: Lucene Users List
Subject: Re: Vector Space Model in Lucene?


In the Lucene-sense of things, sounds like you're after one Document=20
per sentence.  You then get your boundaries automatically as well as=20
the "distance weighting" through the coord() Similarity function.  At=20
least that seems like a close approximation of what Lucene offers. =20
Thoughts?

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org