lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James liu" <liuping.ja...@gmail.com>
Subject Re: Question about QueryParser
Date Fri, 24 Oct 2008 01:00:16 GMT
thks steve, i get it.

2008/10/24 Steven A Rowe <sarowe@syr.edu>

> Hi James,
>
> On 10/23/2008 at 8:30 AM, James liu wrote:
> > public class AnalyzerTest {
> >    @Test
> >    public void test() throws ParseException {
> >        QueryParser parser = new MultiFieldQueryParser(new
> String[]{"title", "body"}, new StandardAnalyzer());
> >        Query query1 = parser.parse("中文");
> >              Query query2 = parser.parse("中 文");
> >             System.out.println(query1);
> >             System.out.println(query2);
> >    }
> > }
> >
> > output :
> >
> > title:"中 文" body:"中 文"
> > (title:中 body:中) (title:文 body:文)
> >
> > why they not  same?
>
> MultiFieldQueryParser extends QueryParser.
>
> QueryParser first breaks up its input on whitespace, and then sends each
> chunk to the analyzer.
>
> query1 contains no whitespace, so the entire string is sent to
> StandardAnalyzer, which produces one token per CJK character, for a total of
> two tokens.  When QueryParser receives more than one token back from the
> analyzer, it produces a PhraseQuery (unless tokens share the same position,
> but this is not applicable to your situation).  MultiFieldQueryParser
> specializes this action by producing one PhraseQuery per Field.
>
> query2 contains a space, so the following is performed for each
> whitespace-separated chunk (in query2, this means one character per chunk):
>
> The chunk is sent to the analyzer, which produces a single token for each
> CJK character - one token is produced because each chunk contains a single
> character.  When QueryParser receives a single token back from the analyzer,
> it produces a TermQuery.  MultiFieldQueryParser specializes this action by
> producing one TermQuery per Field, and then groups these per-Field
> TermQuery's together in a BooleanQuery, denoted in the output as a
> parenthesized group.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
regards
j.L
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message