lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Lucene Highlighting and Dynamic Summaries
Date Thu, 12 Mar 2009 17:56:34 GMT
Hi

I have found that it is not issue with POI. I extracted text using PoI  
but differenlty and the term is extracted properly.  When I store the  
text and retrieve it the term exists. However running the text through  
highlighter doesn't work

I will post test case with plain text file on JIRA. Currently on a  
cramped train!

Cheers


On 11 Mar 2009, at 18:11, markharw00d <markharw00d@yahoo.co.uk> wrote:

> If you can supply a Junit test that recreates the problem I think we  
> can start to make progress on this.
>
>
>
> Amin Mohammed-Coleman wrote:
>> Hi
>>
>> Apologies for re sending this mail. Just wondering if anyone has  
>> experienced the below. I'm not sure if this could happen due nature  
>> of document. It does seem strange one term search returns summary  
>> while another does not even though same document is being returned.
>>
>> I'm asking this so I can code around this if is normal.
>>
>>
>> Apologies again for re sending this mail
>>
>> Cheers
>>
>> Amin
>>
>> Sent from my iPhone
>>
>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@gmail.com>  
>> wrote:
>>
>>> Hi
>>>
>>> I am seeing some strange behaviour with the highlighter and I'm  
>>> wondering if anyone else is experiencing this.  In certain  
>>> instances I don't get a summary being generated.  I perform the  
>>> search and the search returns the correct document.  I can see  
>>> that the lucene document contains the text in the field.  However  
>>> after doing:
>>>
>>>    SimpleHTMLFormatter simpleHTMLFormatter = new  
>>> SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>");
>>>            //required for highlighting
>>>            Query query2 = multiSearcher.rewrite(query);
>>>            Highlighter highlighter = new  
>>> Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
>>> ...
>>>
>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>                TokenStream tokenStream =  
>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new  
>>> StringReader(text));
>>>                String result =  
>>> highlighter.getBestFragments(tokenStream, text, 3, "...");
>>>
>>>
>>> the string result is empty.  This is very strange, if i try a  
>>> different term that exists in the document then I get a summary.   
>>> For example I have a word document that contains the term  
>>> "document" and "aspectj".  If I search for "document" I get the  
>>> correct document but no highlighted summary.  However if I search  
>>> using "aspectj" I get the same doucment with highlighted summary.
>>>
>>> Just to mentioned I do rewrite the original query before  
>>> performing the highlighting.
>>>
>>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>>
>>> Cheers
>>> Amin
>>>
>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
>>> > wrote:
>>> Hi
>>>
>>> Got it working!  Thanks again for your help!
>>>
>>>
>>> Amin
>>>
>>>
>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
>>> > wrote:
>>> Thanks!  The final piece that I needed to do for the project!
>>>
>>> Cheers
>>>
>>> Amin
>>>
>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@thetaphi.de>  
>>> wrote:
>>> > cool.  i will use compression and store in index. is there  
>>> anything
>>> > special
>>> > i need to for decompressing the text? i presume i can just do
>>> > doc.get("content")?
>>> > thanks for your advice all!
>>>
>>> No just use Field.Store.COMPRESS when adding to index and  
>>> Document.get()
>>> when fetching. The decompression is automatically done.
>>>
>>> You may think, why not enable compression for all fields? The case  
>>> is, that
>>> this is an overhead for very small and short fields. So you should  
>>> only use
>>> it for large contents (it's the same like compressing very small  
>>> files as
>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>
>>> Uwe
>>>
>>>
>>> --- 
>>> ------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>> --- 
>> ---------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: 270.11.10/1995 
>>  - Release Date: 03/11/09 08:28:00
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message