lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HPDrifter <dustin.ly...@exobox.com>
Subject Re: Getting tokens from search results. Simple concept
Date Fri, 27 Feb 2009 15:17:10 GMT

Yes, I have but it is too memory intensive.   I used highlighter as my first
attempt but it was not a good solution because, I have to send the entire
text to highlighter.

What I did instead is similar to your suggestion.  
1. use the analyzer to return me a token stream.
2. search the token stream for the keyword I'm looking for (need to analyze
that keyword as well!)
3. extract the token's offset.
4. use the offsets in the index and Java's RandomFileArray to "seek" the
byte(character) position then extract a "fragment" of about 500 chars around
that index.

This solution requires little memory use and, I hope, will work as I expect
under steady stress.

How does this sound to you?

What I would LOVE is if I could do it in a standard Lucene search like I
mentioned earlier. 
Hit.doc[0].getHitTokenList() :confused:
Something like this...

~Dustin




Erik Hatcher wrote:
> 
> Have you looked at the contrib Highlighter?  Or using an Analyzer  
> directly to give you the offsets?
> 
> 	Erik
> 
> On Feb 26, 2009, at 9:32 AM, HPDrifter wrote:
> 
>>
>> When I get a search result based on my index, I need the exact  
>> tokens which
>> were identified in the index as part of the result.  Why?  I need the
>> character offsets.
>>
>> I have a solution right now...almost, but it bugs the hell out of me  
>> that I
>> can say something like...
>> documentHit[0].getIdentifiedTokens();
>>
>> Do I need to make a contribution in order to make this happen?:ninja:
>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Getting-tokens-from-search-results.--Simple-concept-tp22225364p22225364.html
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Getting-tokens-from-search-results.--Simple-concept-tp22225364p22247863.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message