lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Whitespace Analyzer not producing expected search results
Date Tue, 16 Nov 2004 14:25:38 GMT

We have indexed a set of web files (jsp , js , xslt , java properties and
html) using the lucene Whitespace Analyzer.
The purpose is to allow developers to find where code / functions are used
and defined across a large and dissperate
content management repository. Hopefully to aid code re-use, easier
refactoring and standards control.

However when a query parser search is made using a whitespace analyser with
a string known to be in an indexed file, the search returns zero hits.

For example the string  <jsp\:include page
=\"/path1/path2/path3/path4/file1.jsp\" /> is
searched for using the query parser (escaping the meta-chars)and an indexed
document which contains
the following text should be found ?

 // include HTML head
             <jsp:include page="/path1/path2/path3/path4/file1.jsp" />

             <script language="JavaScript" src
             <!-- <script>

 I've taken a look at the FAQ advice regarding checking the effects of an
analyser (in our case whitespace) but our test class returns the expected
tokens for any given token stream. For Example this string  "<% mytoken1
mytoken2 %>" is tokenised by the whitespace analyzer as [<%] [mytoken1]
[mytoken2] [%>].

I'm sure I've missed something but i can't see what it is. If anyone could
shed any light on posible reasons for why we are getting zero hits for text
strings which are in our indexed files I'd be really gratefull. See below
for more info on index and search set up

Thanks a lot Lee C

File contents are  in a tokenised , indexed not stored field.
Index uses the whitespace analyzer which comes with lucene

Searches are performed using a boolean query. The boolean query is made up
of a query parser which gets its search term from an html text box entered
by the user and a prefix query which is used to limit search scope by
directory paths.
the search uses a whitespace analyzer, no filtering takes place


Get the best from British Airways at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message