lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudarsan, Sithu D." <>
Subject RE: Lucene QueryParser and Analyzer
Date Thu, 29 Apr 2010 19:54:21 GMT

Is there a whitespace after the comma? 

Sithu D Sudarsan

-----Original Message-----
From: Wei Ho [] 
Sent: Thursday, April 29, 2010 3:51 PM
Subject: Lucene QueryParser and Analyzer


I'm using Lucene to index and search through a collection of Chinese 
documents. However, I'm noticing an odd behavior in query

Given the two queries below:

(Ci refers to Chinese character i)
Input1: C1C2,C3C4,C5C6,C7,C8C9C10
Input2: C1C2  C3C4  C5C6  C7  C8C9C10

Input1 returns absolutely nothing, while Input2 (replacing the commas 
with spaces) works as expected. I'm a bit confused why this would be 
happening - it seems that QueryParser uses the Analyzer passed to it to 
tokenize the input query string, so if the Analyzer ignores the 
punctuations, it seems that Input1 and Input2 should return identical 
results. Is there some pre-Analyzer filtering or whatever that 
QueryParser does? I've tried this with the StandardAnalyzer, 
SmartChineseAnalyzer, and an analyzer that I implemented which 
explicitly skips over punctuations and whitespaces in tokenizing the 
query string, but to no avail.

-------sample code-------------
Analyzer analyzer = new LingPipeAnalyzer();
Searcher searcher = new IndexSearcher(directory);
QueryParser qParser = new MultiFieldQueryParser(Version.LUCENE_30, 
SEARCH_FIELDS, analyzer);
Query query = qParser.parse(queryLine[1]);
ScoreDoc[] results =, TOP_N).scoreDocs;

I'm probably just doing something dumb, but any help would be greatly 

Wei Ho

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message