lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Lucene Highlighting and Dynamic Summaries
Date Fri, 13 Mar 2009 05:49:50 GMT
Hi

I think that would be good. Probably a silly thing to ask but I guess  
there is a performance implication by setting it to max value.

Is there a general setting that other developers use?

Cheers

Amin



On 12 Mar 2009, at 22:03, Michael McCandless  
<lucene@mikemccandless.com> wrote:

>
> IndexWriter has such behavior too, and because it was such a common  
> trap
> (developers could not understand why their content was being  
> truncated), we
> made that setting explicit, up front so you were aware of it.
>
> I think this in general is a reasonable approach for settings that  
> "lose" stuff (content,
> highlighted terms, etc.).
>
> Maybe we should do the same for highlighter?
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>> I did the following:
>>
>> highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
>>
>>
>> which works.
>>
>> On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
>> >wrote:
>>
>>> JIRA updated.  Includes new testcase which shows highlighter not  
>>> working as
>>> expected.
>>>
>>>
>>> On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman <aminmc@gmail.com 
>>> >wrote:
>>>
>>>> Hi
>>>>
>>>> I have found that it is not issue with POI. I extracted text  
>>>> using PoI but
>>>> differenlty and the term is extracted properly.  When I store the  
>>>> text and
>>>> retrieve it the term exists. However running the text through  
>>>> highlighter
>>>> doesn't work
>>>>
>>>> I will post test case with plain text file on JIRA. Currently on  
>>>> a cramped
>>>> train!
>>>>
>>>> Cheers
>>>>
>>>>
>>>>
>>>> On 11 Mar 2009, at 18:11, markharw00d <markharw00d@yahoo.co.uk>  
>>>> wrote:
>>>>
>>>> If you can supply a Junit test that recreates the problem I think  
>>>> we can
>>>>> start to make progress on this.
>>>>>
>>>>>
>>>>>
>>>>> Amin Mohammed-Coleman wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Apologies for re sending this mail. Just wondering if anyone has
>>>>>> experienced the below. I'm not sure if this could happen due  
>>>>>> nature of
>>>>>> document. It does seem strange one term search returns summary  
>>>>>> while another
>>>>>> does not even though same document is being returned.
>>>>>>
>>>>>> I'm asking this so I can code around this if is normal.
>>>>>>
>>>>>>
>>>>>> Apologies again for re sending this mail
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Amin
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi
>>>>>>>
>>>>>>> I am seeing some strange behaviour with the highlighter and I'm
>>>>>>> wondering if anyone else is experiencing this.  In certain  
>>>>>>> instances I don't
>>>>>>> get a summary being generated.  I perform the search and the
 
>>>>>>> search returns
>>>>>>> the correct document.  I can see that the lucene document  
>>>>>>> contains the text
>>>>>>> in the field.  However after doing:
>>>>>>>
>>>>>>> SimpleHTMLFormatter simpleHTMLFormatter = new
>>>>>>> SimpleHTMLFormatter("<span class=\"highlight\"><b>",
"</b></ 
>>>>>>> span>");
>>>>>>>         //required for highlighting
>>>>>>>         Query query2 = multiSearcher.rewrite(query);
>>>>>>>         Highlighter highlighter = new
>>>>>>> Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
>>>>>>> ...
>>>>>>>
>>>>>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>>>>>             TokenStream tokenStream =
>>>>>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>>>>>> StringReader(text));
>>>>>>>             String result =  
>>>>>>> highlighter.getBestFragments(tokenStream,
>>>>>>> text, 3, "...");
>>>>>>>
>>>>>>>
>>>>>>> the string result is empty.  This is very strange, if i try a
 
>>>>>>> different
>>>>>>> term that exists in the document then I get a summary.  For 

>>>>>>> example I have a
>>>>>>> word document that contains the term "document" and  
>>>>>>> "aspectj".  If I search
>>>>>>> for "document" I get the correct document but no highlighted
 
>>>>>>> summary.
>>>>>>> However if I search using "aspectj" I get the same doucment with
>>>>>>> highlighted summary.
>>>>>>>
>>>>>>> Just to mentioned I do rewrite the original query before  
>>>>>>> performing the
>>>>>>> highlighting.
>>>>>>>
>>>>>>> I'm not sure what i'm missing here.  Any help would be  
>>>>>>> appreciated.
>>>>>>>
>>>>>>> Cheers
>>>>>>> Amin
>>>>>>>
>>>>>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <
>>>>>>> aminmc@gmail.com> wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> Got it working!  Thanks again for your help!
>>>>>>>
>>>>>>>
>>>>>>> Amin
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
>>>>>>> aminmc@gmail.com> wrote:
>>>>>>> Thanks!  The final piece that I needed to do for the project!
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> Amin
>>>>>>>
>>>>>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@thetaphi.de>
>>>>>>> wrote:
>>>>>>>> cool.  i will use compression and store in index. is there
 
>>>>>>>> anything
>>>>>>>> special
>>>>>>>> i need to for decompressing the text? i presume i can just
do
>>>>>>>> doc.get("content")?
>>>>>>>> thanks for your advice all!
>>>>>>>
>>>>>>> No just use Field.Store.COMPRESS when adding to index and
>>>>>>> Document.get()
>>>>>>> when fetching. The decompression is automatically done.
>>>>>>>
>>>>>>> You may think, why not enable compression for all fields? The
 
>>>>>>> case is,
>>>>>>> that
>>>>>>> this is an overhead for very small and short fields. So you 

>>>>>>> should only
>>>>>>> use
>>>>>>> it for large contents (it's the same like compressing very  
>>>>>>> small files
>>>>>>> as
>>>>>>> ZIP/GZIP: These files mostly get larger than without  
>>>>>>> compression).
>>>>>>>
>>>>>>> Uwe
>>>>>>>
>>>>>>>
>>>>>>> --- 
>>>>>>> --- 
>>>>>>> ---------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>> help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --- 
>>>>>> --- 
>>>>>> --- 
>>>>>> ---------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>> No virus found in this incoming message.
>>>>>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
>>>>>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --- 
>>>>> ------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message