lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre GOSSE <pierre.go...@arisem.com>
Subject RE: FastVectorHighlighter.getBestFragments returning null
Date Fri, 27 May 2011 13:37:26 GMT
Actually, this second issue was opened since Highlight seams to ignore positions and treats
WITH_POSITIONS_OFFSETS like it was WITH_OFFSETS.

https://issues.apache.org/jira/browse/LUCENE-3091

As far as I remember, the trouble is that to trust positions in the tokenstream built from
termvector, you have to know the field properties, and it isn't accessible at the code level
where the decision is made to use offset or positions. So some modifications are to be made
to pass this information with the token stream, or to give access to field properties to the
highlighter. Neither of those seamed straightforward. But I really did take a very short look
so I'm sure of nothing there.

I hope that someone of greater vision will find an elegant solution to this :). But otherwise
I hope to find some time to take a look in a couple weeks, while I've part of the context
still in mind.

Pierre

-----Message d'origine-----
De : Joel Halbert [mailto:joel@su3analytics.com] 
Envoyé : vendredi 27 mai 2011 14:05
À : java-user@lucene.apache.org
Objet : RE: FastVectorHighlighter.getBestFragments returning null

Hi Pierre,

Thanks for the pointer. So if I understand correctly this bug definitely
applies to fields with TermVector.WITH_OFFSETS.

My field uses TermVector.WITH_POSITIONS_OFFSETS)

I wasn't sure from the bug report if it applies to
WITH_POSITIONS_OFFSETS as well? It looks like it might?

- Joel

On Fri, 2011-05-27 at 13:56 +0200, Pierre GOSSE wrote:

> Hi,
> 
> Maybe is it related to :
> https://issues.apache.org/jira/browse/LUCENE-3087
> 
> Pierre
> 
> -----Message d'origine-----
> De : Joel Halbert [mailto:joel@su3analytics.com] 
> Envoyé : vendredi 27 mai 2011 12:57
> À : lucene users
> Objet : FastVectorHighlighter.getBestFragments returning null
> 
> Hi,
> 
> I'm using Lucene 3.0.3. I'm extracting snippets using
> FastVectorHighlighter, for some snippets (I think always when searching
> for exact matches, quoted) the fragment is null.
> 
> Code looks like:
> 
> 
> 			query = QueryParser.escape(query);
> 			if (exact) {
> 				query = "\""+query+"\"";
> 			}
>                         BooleanQuery allQ = new BooleanQuery();
> 			Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, analyser).parse(query);
> 			termQ.add(new BooleanClause(bodyQ, Occur.SHOULD));
>                         // add more queries
>                         allQ.add(new BooleanClause(termQ, Occur.MUST));
>                         
> 			TopDocs res = is.search(allQ, null, upperRange);	
> 			FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true);
> 			
> 			for (int i = in.getLowerRange(); i < Math.min(res.totalHits, upperRange); i++)
{
> 
>                            	String[] bodyFrags =
> 						highlighter.getBestFragments(highlighter.getFieldQuery(bodyQ),
> 						is.getIndexReader(), res.scoreDocs[i].doc, BODY, 120, 2);
>                 
>                                 // bodyFrags is null
>                     }
> 
> 
> I do get a hit, and the content with the exact match is coming from the
> BODY field, but I cann't seem to get the fragment out.
> 
> Any clues,
> 
> Thanks
> 
> - Joel


Mime
View raw message