lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: highlighting with WildcardQuery
Date Sat, 14 Oct 2006 07:34:42 GMT
The IndexReader is needed for finding all wildcard matches (by the index
lexicon). It seems you do not want to expand the wild card query by the
index lexicon, but rather with that of the highlighted text (which may not
be indexed at all). I think you have at least two ways to do that:

(1) create a (highlight) QueryScorer with:
   new QueryScorer(WeightedTerm weightedTerms[])
which means that you provide all the "lexicon" knowledge usually taken from
the index(reader), i.e. which words are valid for the wild card
'expression'.

(2) extend QueryScorer, implementing
   float getTokenScore(Token token)
such that tokens matching the wildcard expr get nonzero score.

- Doron

"James O'Rourke" <james@bittorrent.com> wrote on 13/10/2006 11:39:31:

> Is there anyway to do highlighting when using a WildcardQuery when
> there is no IndexReader available? I simply want to do it with a
> chunk of text, but it fails because the WildcardQuery needs to call
> rewrite - but doesn't know about the IndexReader.
>
> Code (using PyLucene-2.0.0 - can translate to java if like)
>
> def gethighlightedfragments(text, searchString,
>      fragmentLength = 50, numFragments = 3,  opening= '<span class=
> \"highlight\">', closing = '</span>'):
>      """ Returns a list of text fragments with returns included for
> 80 char max width """
>      """ Defaults to OR operator which is good for formatting """
>      analyzer = StandardAnalyzer()
>      #print text
>      strs = searchString.split()
>      bq = BooleanQuery()
>      for s in strs:
>          print s
>          q = WildcardQuery(Term('f', '*' + s +  '*'))
>          #print q.toString()
>          bq.add(q,  BooleanClause.Occur.SHOULD)
>      #print bq.toString()
>      scorer = QueryScorer(bq)
>      formatter = SimpleHTMLFormatter(opening, closing)
>      highlighter = Highlighter(formatter, scorer)
>      fragmenter = SimpleFragmenter(fragmentLength)
>      highlighter.setTextFragmenter(fragmenter)
>
>      tokenStream = analyzer.tokenStream('f', StringReader(text))
>      return  highlighter.getBestFragments(tokenStream, text,
> numFragments)
>
>
> Basically, I want to show partial word matches also.
>
> James
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message