lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Whitespace Analyzer not producing expected search results
Date Tue, 16 Nov 2004 16:19:14 GMT
Try using a TermQuery instead of QueryParser to see if you get the  
results you expect.  Exact case matters.

Also, when troubleshooting issues with QueryParser, it is helpful to  
see what the actual Query returned is - try displaying its toString  
output.

	Erik

On Nov 16, 2004, at 6:25 AM, lee.a.carroll@britishairways.com wrote:

> Hi,
>
> We have indexed a set of web files (jsp , js , xslt , java properties  
> and
> html) using the lucene Whitespace Analyzer.
> The purpose is to allow developers to find where code / functions are  
> used
> and defined across a large and dissperate
> content management repository. Hopefully to aid code re-use, easier
> refactoring and standards control.
>
> However when a query parser search is made using a whitespace analyser  
> with
> a string known to be in an indexed file, the search returns zero hits.
>
> For example the string  <jsp\:include page
> =\"/path1/path2/path3/path4/file1.jsp\" /> is
> searched for using the query parser (escaping the meta-chars)and an  
> indexed
> document which contains
> the following text should be found ?
>
>  // include HTML head
> %>
>              <jsp:include page="/path1/path2/path3/path4/file1.jsp" />
>
>              <script language="JavaScript" src
> ="/path1/path2/path3/file1.js"></script>
>              <!-- <script>
>
>  I've taken a look at the FAQ advice regarding checking the effects of  
> an
> analyser (in our case whitespace) but our test class returns the  
> expected
> tokens for any given token stream. For Example this string  "<%  
> mytoken1
> mytoken2 %>" is tokenised by the whitespace analyzer as [<%] [mytoken1]
> [mytoken2] [%>].
>
> I'm sure I've missed something but i can't see what it is. If anyone  
> could
> shed any light on posible reasons for why we are getting zero hits for  
> text
> strings which are in our indexed files I'd be really gratefull. See  
> below
> for more info on index and search set up
>
> Thanks a lot Lee C
>
> File contents are  in a tokenised , indexed not stored field.
> Index uses the whitespace analyzer which comes with lucene
>
> Searches are performed using a boolean query. The boolean query is  
> made up
> of a query parser which gets its search term from an html text box  
> entered
> by the user and a prefix query which is used to limit search scope by
> directory paths.
> the search uses a whitespace analyzer, no filtering takes place
>
>
>
>
>
>
> ----------------------------------------------------------------------- 
> --------------------------
>
> Get the best from British Airways at ba.com
> http://www.ba.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message