lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Lucene Highlighting and Dynamic Summaries
Date Thu, 12 Mar 2009 07:47:20 GMT
Hi
Please find attadched a test case plus a document.  Just to mention this
occurs sometimes for other files.


Cheers
Amin

On Wed, Mar 11, 2009 at 6:11 PM, markharw00d <markharw00d@yahoo.co.uk>wrote:

> If you can supply a Junit test that recreates the problem I think we can
> start to make progress on this.
>
>
>
> Amin Mohammed-Coleman wrote:
>
>> Hi
>>
>> Apologies for re sending this mail. Just wondering if anyone has
>> experienced the below. I'm not sure if this could happen due nature of
>> document. It does seem strange one term search returns summary while another
>> does not even though same document is being returned.
>>
>> I'm asking this so I can code around this if is normal.
>>
>>
>> Apologies again for re sending this mail
>>
>> Cheers
>>
>> Amin
>>
>> Sent from my iPhone
>>
>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@gmail.com> wrote:
>>
>>  Hi
>>>
>>> I am seeing some strange behaviour with the highlighter and I'm wondering
>>> if anyone else is experiencing this.  In certain instances I don't get a
>>> summary being generated.  I perform the search and the search returns the
>>> correct document.  I can see that the lucene document contains the text in
>>> the field.  However after doing:
>>>
>>>    SimpleHTMLFormatter simpleHTMLFormatter = new
>>> SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>");
>>>            //required for highlighting
>>>            Query query2 = multiSearcher.rewrite(query);
>>>            Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
>>> new QueryScorer(query2));
>>> ...
>>>
>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>                TokenStream tokenStream =
>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>> StringReader(text));
>>>                String result = highlighter.getBestFragments(tokenStream,
>>> text, 3, "...");
>>>
>>>
>>> the string result is empty.  This is very strange, if i try a different
>>> term that exists in the document then I get a summary.  For example I have a
>>> word document that contains the term "document" and "aspectj".  If I search
>>> for "document" I get the correct document but no highlighted summary.
>>>  However if I search using "aspectj" I get the same doucment with
>>> highlighted summary.
>>>
>>> Just to mentioned I do rewrite the original query before performing the
>>> highlighting.
>>>
>>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>>
>>> Cheers
>>> Amin
>>>
>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <aminmc@gmail.com>
>>> wrote:
>>> Hi
>>>
>>> Got it working!  Thanks again for your help!
>>>
>>>
>>> Amin
>>>
>>>
>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <aminmc@gmail.com>
>>> wrote:
>>> Thanks!  The final piece that I needed to do for the project!
>>>
>>> Cheers
>>>
>>> Amin
>>>
>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>>> > cool.  i will use compression and store in index. is there anything
>>> > special
>>> > i need to for decompressing the text? i presume i can just do
>>> > doc.get("content")?
>>> > thanks for your advice all!
>>>
>>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>>> when fetching. The decompression is automatically done.
>>>
>>> You may think, why not enable compression for all fields? The case is,
>>> that
>>> this is an overhead for very small and short fields. So you should only
>>> use
>>> it for large contents (it's the same like compressing very small files as
>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>
>>> Uwe
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>> ------------------------------------------------------------------------
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
View raw message