lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <bur...@newsmonster.org>
Subject Re: Performance of hit highlighting and finding term positions for a specific document
Date Wed, 31 Mar 2004 01:50:07 GMT
Erik Hatcher wrote:

> On Mar 30, 2004, at 7:56 PM, Kevin A. Burton wrote:
>
>> Trying to do hit highlighting.  This implementation uses another 
>> Analyzer to find the positions for the result terms.
>> This seems that it's very inefficient since lucene already knows the 
>> frequency and position of given terms in the index.
>
>
> What if the original analyzer removed stopped words, stemmed, and 
> injected synonyms?

Just use the same analyzer :)... I agree it's not the best approach for 
this reason and the CPU reason.

>> Also it seems that after all this time that Lucene should have 
>> efficient hit highlighting as a standard package.  Is there any 
>> interest in seeing a contribution in the sandbox for this if it uses 
>> the index positions?
>
>
> Big +1, regardless of the implementation details.  Hit hilighting is 
> so commonly requested that having it available at least in the 
> sandbox, or perhaps even in the core, makes a lot of sense. 

Well if we could make it efficient by using the frequency and positions 
of terms we're all set :)... I just need to figure out how to do this 
efficiently per document.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


Mime
View raw message