From lucene-user-return-119-apmail-jakarta-lucene-user-archive=jakarta.apache.org@jakarta.apache.org Tue Oct 23 11:05:06 2001 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 98545 invoked by uid 500); 23 Oct 2001 11:05:04 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 98521 invoked from network); 23 Oct 2001 11:05:01 -0000 Subject: RE: Context specific summary with the search term From: Lee Mallabone To: Doug Cutting Cc: lucene-user@jakarta.apache.org In-Reply-To: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C9B@mail.grandcentral.com> References: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C9B@mail.grandcentral.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/0.14 (Preview Release) Date: 23 Oct 2001 12:03:58 +0100 Message-Id: <1003835038.3561.96.camel@murphy.granta.internal> Mime-Version: 1.0 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Mon, 2001-10-22 at 17:43, Doug Cutting wrote: > > I'm trying to implement this and should be able to contribute any > > succesful results, but I need to produce context on a per-field basis. > > How did the title ever get indexed as the title? Presumably you split the > document into fields when it was indexed. Similarly, if you re-tokenize > things a field at a time then you should always know which field you are in, > no? I'm indexing HTML documents marked up with comments to indicate field boundaries. So I'd typically have: blurb more blurb and so on. The documents were indexed by looking for each field marker and then adding the subsequent lines to the relevant field. In order to obtain a generic solution for context generation are you suggesting I write a method that takes plain text, (eg, text form of document) and a query, and assumes the plain text is in the query's default field? This doesn't seem quite as useful as getContext(Hashset queryTerms, Reader originalDocument); which is what I was originally aiming towards. Regards, -- Lee Mallabone