lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yin...@indiana.edu
Subject Re: highlight problem
Date Thu, 05 May 2005 15:43:25 GMT
Quoting mark harwood <markharw00d@yahoo.co.uk>:

Hi, Mark,

I just used StandardAnalyzer and code is as following:

=====================================================
      Analyzer analyzer = new StandardAnalyzer();
      BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
      String line = in.readLine();

      if (line.length() == -1)
        return;

      Query query = QueryParser.parse(line, "contents", analyzer);
      Hits hits = searcher.search(query);
      Highlighter highlighter =new Highlighter(new QueryScorer(query));
========================================================

Part of the fulltext result is:
========================================================
 sectors in South Africa have failed to incorporate many co-management
principles, such as joint... of the RNP. However, poor representation of
community interests on the joint management committee... of a joint management
committee and the improvement of infrastructure in the area. The key differences
......
========================================================
Of course the result is far more than this.

The paper itself looks normal to me. It makes me confused why this is happening.

Thanks for your help,
Ying







> >> One of my
> >> search results from our 
> >> records contains far too much of the text
> 
> This is a problem I haven't seen before. I suspect it
> may have something to do with your choice of analyzer.
> Your paper will only ever be fragmented on "token gap"
> boundaries ie points in the token stream where the
> current token position does not overlap with the
> previous token's . If the section in your text which
> contains the search terms contains a long stream of
> overlapping tokens you will end up with a long
> highlighted selection.
> 
> Which analyzer are using out of interest?
> 
> 
> Cheers
> Mark
> 
> 
> 
> --- yinjin@indiana.edu wrote:
> > 
> > 
> > Hi, All,
> > 
> > I use lucene highlight package to generate KWIC for
> > our application.
> > 
> > The part of the code is as following:
> >
> =====================================================
> >         if(text != null ){
> >           TokenStream tokenStream =
> > analyzer.tokenStream("contents",
> >               new StringReader(text));
> >           // Get 3 best fragments and seperate with
> > a "..."
> >           result =
> > highlighter.getBestFragments(tokenStream,
> >               text, 3, "...");
> >         }
> > 
> >
> =====================================================
> > 
> > However, I got a very strange problem. One of my
> > search results from our 
> > records contains far too much of the text of the
> > paper. It doesn't happen 
> > for the same paper when I changed the search
> > criteria.
> > 
> > Thanks very mcuh for your help,
> > Ying 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> 	
> 	
> 		
> ___________________________________________________________ 
> Yahoo! Messenger - want a free and easy way to contact your friends online?
> http://uk.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message