lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <kor...@lycos.com>
Subject HighLighting Service
Date Tue, 09 Apr 2002 19:22:50 GMT
hi all,
i am working on the Highlight terms functionality of Lucene.
I followed step by step the suggestion of Maik Schreiber (http://www.iq-computing.de/lucene/highlight.htm),
i implemented it with some changes:
In the white paper the HL was based just on the summary field, my version read the document
(from a cache) with SelfBufferedStream method, into a string that is passed to the HighLight
method.

Some problem show up here:

1.It doesn't work with all the Query , e.g.: WidcardQuery,FuzzyQuery,PrefixQuery, PhraseQuery.

3.The response time is not constant, e.g.: if the documents to produce highlight are big files
, like 2/4 MB , the average response time per query is:
-20 sec 10 doc of 2 mb each 
otherwise for small files:
-0.6 sec 10 doc of 20 kb each

What we can do? any suggestion?

Some tips :
1.Document must be plain text to have a good result, this mean there are 2 options: first
build a text version on the document at runtime (if there are big document this will be an
other handycap in response time), second have a cache of all the document is plain text version.

2.The HL process produce an highlighted version for the entire document, while would be good
have just a portion or 2 or 3.
In this case we can take advantage because we cut the iteration process when we are done,
saving some time and resource.

3.I think we should incorporate this feature in Lucene, right now to make this working you
should change some code in the Lucene package, so stay up to date require to change every
time these part of code (if they are still there!!).Also because it strictly depend on the
Lucene core package.

I attach my version of the LuceneTools.java and the code i wrote used by the servlet:
...
String brief;
String url = doc.get("url"); //get the cached plain text version of document to highlight
StringBuffer sb = new StringBuffer("");
StringBuffer sblower = new StringBuffer("");
String s = new String();
FileInputStream fis = new FileInputStream(url) ;
byte[] b = new byte[1024];
int effective=-1;
while( (effective=fis.read(b))!=-1 )
{
 s = new String(b);
 sb.append( s );
 sblower.append(s.toLowerCase());
}
fis.close();
{
 brief = LuceneTools.highlightTerms( sb.toString() , sblower.toString(), highLighter , query,
analyzer);
}
catch(Exception e)
{e.printStackTrace();}
out.println(searchUI.getSearchItem(score,doctitle,url,"..."+brief+"..."));

....

I hope someone can help me giving some tips to make me able to complete this functionality.
Thanks, bye. 




See Dave Matthews Band live or win a signed guitar
http://r.lycos.com/r/bmgfly_mail_dmb/http://win.ipromotions.com/lycos_020201/splash.asp 
Mime
View raw message