Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 8543 invoked by uid 500); 22 Oct 2001 16:54:19 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 8523 invoked from network); 22 Oct 2001 16:54:18 -0000 Message-ID: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C9B@mail.grandcentral.com> From: Doug Cutting To: 'Lee Mallabone' Cc: lucene-user@jakarta.apache.org Subject: RE: Context specific summary with the search term Date: Mon, 22 Oct 2001 09:43:09 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > From: Lee Mallabone [mailto:lee@grantadesign.com] > > I'm trying to implement this and should be able to contribute any > succesful results, but I need to produce context on a per-field basis. > Eg. if I got a token hit in the text body of a document, but the first > hit token was a word in the section title, I'd want to > generate context > around the token in the text body. How did the title ever get indexed as the title? Presumably you split the document into fields when it was indexed. Similarly, if you re-tokenize things a field at a time then you should always know which field you are in, no? > I had been using a TokenStream to try this. However, lucene's Token > class doesn't seem to have any concept of fields, (even when I > tokenStream() a document that is in the index with a whole bunch of > fields). Is there any reason for this? Moreover, any > suggestions of how > to find the information I need? > > The natural thing seems to be to have a field-aware token stream, but > I'm not sure how I'd go about implementing that... > > Regards, > > -- > Lee Mallabone >