lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amin Mohammed-Coleman <ami...@gmail.com>
Subject Re: Lucene Highlighting and Dynamic Summaries
Date Thu, 12 Mar 2009 11:29:04 GMT
Hi

Did both attachments not come through?

Cheers
Amin

On Thu, Mar 12, 2009 at 9:52 AM, mark harwood <markharw00d@yahoo.co.uk>wrote:

> The attachment didn't make it through here. Can you add it as an attachment
> to a new JIRA issue?
>
> Thanks,
> Mark
>
>
>
>
>
> ________________________________
> From: Amin Mohammed-Coleman <aminmc@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, 12 March, 2009 7:47:20
> Subject: Re: Lucene Highlighting and Dynamic Summaries
>
> Hi
>
> Please find attadched a test case plus a document.  Just to mention this
> occurs sometimes for other files.
>
>
> Cheers
> Amin
>
>
> On Wed, Mar 11, 2009 at 6:11 PM, markharw00d <markharw00d@yahoo.co.uk>
> wrote:
>
> If you can supply a Junit test that recreates the problem I think we can
> start to make progress on this.
>
>
>
> Amin Mohammed-Coleman wrote:
>
> Hi
>
> Apologies for re sending this mail. Just wondering if anyone has
> experienced the below.. I'm not sure if this could happen due nature of
> document. It does seem strange one term search returns summary while another
> does not even though same document is being returned.
>
> I'm asking this so I can code around this if is normal.
>
>
> Apologies again for re sending this mail
>
> Cheers
>
> Amin
>
> Sent from my iPhone
>
> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@gmail.com> wrote:
>
>
> Hi
>
> I am seeing some strange behaviour with the highlighter and I'm wondering
> if anyone else is experiencing this.  In certain instances I don't get a
> summary being generated.  I perform the search and the search returns the
> correct document.  I can see that the lucene document contains the text in
> the field.  However after doing:
>
>   SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span
> class=\"highlight\"><b>", "</b></span>");
>           //required for highlighting
>           Query query2 = multiSearcher.rewrite(query);
>           Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
> new QueryScorer(query2));
> ...
>
> String text= doc.get(FieldNameEnum.BODY.getDescription());
>               TokenStream tokenStream =
> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
> StringReader(text));
>               String result = highlighter.getBestFragments(tokenStream,
> text, 3, "...");
>
>
> the string result is empty.  This is very strange, if i try a different
> term that exists in the document then I get a summary.  For example I have a
> word document that contains the term "document" and "aspectj".  If I search
> for "document" I get the correct document but no highlighted summary.
>  However if I search using "aspectj" I get the same doucment with
> highlighted summary.
>
> Just to mentioned I do rewrite the original query before performing the
> highlighting.
>
> I'm not sure what i'm missing here.  Any help would be appreciated.
>
> Cheers
> Amin
>
> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <aminmc@gmail.com>
> wrote:
> Hi
>
> Got it working!  Thanks again for your help!
>
>
> Amin
>
>
> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <aminmc@gmail.com>
> wrote:
> Thanks!  The final piece that I needed to do for the project!
>
> Cheers
>
> Amin
>
> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> > cool.  i will use compression and store in index. is there anything
> > special
> > i need to for decompressing the text? i presume i can just do
> > doc.get("content")?
> > thanks for your advice all!
>
> No just use Field.Store.COMPRESS when adding to index and Document.get()
> when fetching. The decompression is automatically done.
>
> You may think, why not enable compression for all fields? The case is, that
> this is an overhead for very small and short fields. So you should only use
> it for large contents (it's the same like compressing very small files as
> ZIP/GZIP: These files mostly get larger than without compression).
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message