lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: Lucene Highlighting and Dynamic Summaries
Date Wed, 11 Mar 2009 18:11:54 GMT
If you can supply a Junit test that recreates the problem I think we can 
start to make progress on this.



Amin Mohammed-Coleman wrote:
> Hi
>
> Apologies for re sending this mail. Just wondering if anyone has 
> experienced the below. I'm not sure if this could happen due nature of 
> document. It does seem strange one term search returns summary while 
> another does not even though same document is being returned.
>
> I'm asking this so I can code around this if is normal.
>
>
> Apologies again for re sending this mail
>
> Cheers
>
> Amin
>
> Sent from my iPhone
>
> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@gmail.com> wrote:
>
>> Hi
>>
>> I am seeing some strange behaviour with the highlighter and I'm 
>> wondering if anyone else is experiencing this.  In certain instances 
>> I don't get a summary being generated.  I perform the search and the 
>> search returns the correct document.  I can see that the lucene 
>> document contains the text in the field.  However after doing:
>>
>>     SimpleHTMLFormatter simpleHTMLFormatter = new 
>> SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>");
>>             //required for highlighting
>>             Query query2 = multiSearcher.rewrite(query);
>>             Highlighter highlighter = new 
>> Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
>> ...
>>
>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>                 TokenStream tokenStream = 
>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new 
>> StringReader(text));
>>                 String result = 
>> highlighter.getBestFragments(tokenStream, text, 3, "...");
>>
>>
>> the string result is empty.  This is very strange, if i try a 
>> different term that exists in the document then I get a summary.  For 
>> example I have a word document that contains the term "document" and 
>> "aspectj".  If I search for "document" I get the correct document but 
>> no highlighted summary.  However if I search using "aspectj" I get 
>> the same doucment with highlighted summary.
>>
>> Just to mentioned I do rewrite the original query before performing 
>> the highlighting.
>>
>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>
>> Cheers
>> Amin
>>
>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman 
>> <aminmc@gmail.com> wrote:
>> Hi
>>
>> Got it working!  Thanks again for your help!
>>
>>
>> Amin
>>
>>
>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman 
>> <aminmc@gmail.com> wrote:
>> Thanks!  The final piece that I needed to do for the project!
>>
>> Cheers
>>
>> Amin
>>
>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> > cool.  i will use compression and store in index. is there anything
>> > special
>> > i need to for decompressing the text? i presume i can just do
>> > doc.get("content")?
>> > thanks for your advice all!
>>
>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>> when fetching. The decompression is automatically done.
>>
>> You may think, why not enable compression for all fields? The case 
>> is, that
>> this is an overhead for very small and short fields. So you should 
>> only use
>> it for large contents (it's the same like compressing very small 
>> files as
>> ZIP/GZIP: These files mostly get larger than without compression).
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com 
> Version: 8.0.237 / Virus Database: 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message