Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C9B@mail.grandcentral.com>
From: Doug Cutting <DCutting@grandcentral.com>
To: 'Lee Mallabone' <lee@grantadesign.com>
Cc: lucene-user@jakarta.apache.org
Subject: RE: Context specific summary with the search term
Date: Mon, 22 Oct 2001 09:43:09 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"

> From: Lee Mallabone [mailto:lee@grantadesign.com]
> 
> I'm trying to implement this and should be able to contribute any
> succesful results, but I need to produce context on a per-field basis.
> Eg. if I got a token hit in the text body of a document, but the first
> hit token was a word in the section title, I'd want to 
> generate context
> around the token in the text body.

How did the title ever get indexed as the title?  Presumably you split the
document into fields when it was indexed.  Similarly, if you re-tokenize
things a field at a time then you should always know which field you are in,
no?

> I had been using a TokenStream to try this. However, lucene's Token
> class doesn't seem to have any concept of fields, (even when I
> tokenStream() a document that is in the index with a whole bunch of
> fields). Is there any reason for this? Moreover, any 
> suggestions of how
> to find the information I need?
> 
> The natural thing seems to be to have a field-aware token stream, but
> I'm not sure how I'd go about implementing that...
> 
> Regards,
> 
> -- 
> Lee Mallabone
>