lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Lucene Highlighting and Dynamic Summaries
Date Thu, 12 Mar 2009 18:41:40 GMT
JIRA updated.  Includes new testcase which shows highlighter not working as
expected.

On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman <aminmc@gmail.com>wrote:

> Hi
>
> I have found that it is not issue with POI. I extracted text using PoI but
> differenlty and the term is extracted properly.  When I store the text and
> retrieve it the term exists. However running the text through highlighter
> doesn't work
>
> I will post test case with plain text file on JIRA. Currently on a cramped
> train!
>
> Cheers
>
>
>
> On 11 Mar 2009, at 18:11, markharw00d <markharw00d@yahoo.co.uk> wrote:
>
>  If you can supply a Junit test that recreates the problem I think we can
>> start to make progress on this.
>>
>>
>>
>> Amin Mohammed-Coleman wrote:
>>
>>> Hi
>>>
>>> Apologies for re sending this mail. Just wondering if anyone has
>>> experienced the below. I'm not sure if this could happen due nature of
>>> document. It does seem strange one term search returns summary while another
>>> does not even though same document is being returned.
>>>
>>> I'm asking this so I can code around this if is normal.
>>>
>>>
>>> Apologies again for re sending this mail
>>>
>>> Cheers
>>>
>>> Amin
>>>
>>> Sent from my iPhone
>>>
>>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@gmail.com> wrote:
>>>
>>>  Hi
>>>>
>>>> I am seeing some strange behaviour with the highlighter and I'm
>>>> wondering if anyone else is experiencing this.  In certain instances I don't
>>>> get a summary being generated.  I perform the search and the search returns
>>>> the correct document.  I can see that the lucene document contains the text
>>>> in the field.  However after doing:
>>>>
>>>>   SimpleHTMLFormatter simpleHTMLFormatter = new
>>>> SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>");
>>>>           //required for highlighting
>>>>           Query query2 = multiSearcher.rewrite(query);
>>>>           Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
>>>> new QueryScorer(query2));
>>>> ...
>>>>
>>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>>               TokenStream tokenStream =
>>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>>> StringReader(text));
>>>>               String result = highlighter.getBestFragments(tokenStream,
>>>> text, 3, "...");
>>>>
>>>>
>>>> the string result is empty.  This is very strange, if i try a different
>>>> term that exists in the document then I get a summary.  For example I have
a
>>>> word document that contains the term "document" and "aspectj".  If I search
>>>> for "document" I get the correct document but no highlighted summary.
>>>>  However if I search using "aspectj" I get the same doucment with
>>>> highlighted summary.
>>>>
>>>> Just to mentioned I do rewrite the original query before performing the
>>>> highlighting.
>>>>
>>>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>>>
>>>> Cheers
>>>> Amin
>>>>
>>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <aminmc@gmail.com>
>>>> wrote:
>>>> Hi
>>>>
>>>> Got it working!  Thanks again for your help!
>>>>
>>>>
>>>> Amin
>>>>
>>>>
>>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
>>>> aminmc@gmail.com> wrote:
>>>> Thanks!  The final piece that I needed to do for the project!
>>>>
>>>> Cheers
>>>>
>>>> Amin
>>>>
>>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>>>> > cool.  i will use compression and store in index. is there anything
>>>> > special
>>>> > i need to for decompressing the text? i presume i can just do
>>>> > doc.get("content")?
>>>> > thanks for your advice all!
>>>>
>>>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>>>> when fetching. The decompression is automatically done.
>>>>
>>>> You may think, why not enable compression for all fields? The case is,
>>>> that
>>>> this is an overhead for very small and short fields. So you should only
>>>> use
>>>> it for large contents (it's the same like compressing very small files
>>>> as
>>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>>
>>>> Uwe
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
>>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message