lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Ritchie <br...@jivesoftware.com>
Subject Re: Dmitry's Term Vector stuff, plus some
Date Wed, 25 Feb 2004 23:38:45 GMT
markharw00d@yahoo.co.uk wrote:
> nice suggestion about capping the highlighter's number of tokens - I'll add that in.

I agree, good suggestion.

> I've had a quick look at your knowledgebase docs. Can't you split them at index time
into multiple smaller docs using the <a name="xxx"> tags as doc boundaries?
> Each lucene document could then have a field with the URL [sourcedoc]#xxx, taking you
to the relevant section in the source document.

Ideally, yes. Unfortunately, I do not control what our customers put into their knowledge
base. 
Where boundaries are present that's actually quite a good suggestion - thanks!

Doug, do you believe the storing (as an option of course) of token offset information would
be 
something that you'de accept as a contribution to the core of lucene? Does anyone else think
that 
this would be beneficial information to have?


Regards,

Bruce Ritchie
http://www.jivesoftware.com/

Mime
View raw message