lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franklin Simmons <fsimm...@sccmediaserver.com>
Subject RE: QueryParser
Date Wed, 26 May 2010 17:04:53 GMT
Given the text "3472_003_Executa_Carga.sql" The StandardAnalyzer will return these tokens 

3472_003_executa   
carga.sql	

The first token is of type <NUM>, the second is of type <HOST>.  Note the query
parser changed the filename field a phrase query (quoted). 
 
It's quite useful to have a little app to show information about the tokens any given analyzer
will produce.  The method below does that.
 
void ShowTokens(Lucene.Net.Analysis.Analyzer someAnalyzer, string someText)
{
    Lucene.Net.Analysis.TokenStream stream = someAnalyzer.TokenStream("", new System.IO.StringReader(someText));
                               
    Lucene.Net.Analysis.Token token = stream.Next();
    while (token != null)
    {
        string info = String.Format("{0}\t{1}\t{2}\t{3}\t{4}", token.Type(), token.GetPositionIncrement(),
token.StartOffset(), token.EndOffset(), token.TermText());
        // display info somewhere, e.g. System.Console.WriteLine(info);
        token = stream.Next();
    }
}


-----Original Message-----
From: Leonardo Azize Martins [mailto:lazize@gmail.com] 
Sent: Wednesday, May 26, 2010 12:34 PM
To: lucene-net-user@lucene.apache.org
Cc: lucene-net-user@incubator.apache.org
Subject: Re: QueryParser

I am using the "StandardAnalyzer".
But it is a bug or it is normal?
Because "3472_003_Executa_Carga.sql" contains many others "_" and the
analyzer removes only one.
Regards


2010/5/26 Robert Jordan <robertj@gmx.net>

>  On 26.05.2010 16:36, Leonardo Azize Martins wrote:
>
>> Hello,
>>
>> I have and index with this struct:
>> new Field(LuceneFactory.ID, file.FullName, Field.Store.YES,
>> Field.Index.NOT_ANALYZED_NO_NORMS);
>> new Field(LuceneFactory.CONTENTS, plainText, Field.Store.YES,
>> Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
>> new Field(LuceneFactory.FULLNAME, file.FullName.ToLower(), Field.Store.NO<http://field.store.no/>
>> ,
>> Field.Index.NOT_ANALYZED_NO_NORMS);
>> new Field(LuceneFactory.FILENAME, file.Name.ToLower(), Field.Store.NO<http://field.store.no/>
>> ,
>> Field.Index.NOT_ANALYZED_NO_NORMS);
>> new Field(LuceneFactory.EXTENSION, fileExtension, Field.Store.NO<http://field.store.no/>
>> ,
>> Field.Index.NOT_ANALYZED_NO_NORMS);
>> new NumericField(LuceneFactory.SIZE, Field.Store.YES,
>> true).SetLongValue(file.Length);
>>  I am using this search:
>> extension:sql AND filename:3472_003_Executa_Carga.sql
>> (note: threre is no space in 3472_003_Executa_Carga.sql)
>>
>> When I use parser method, Query object looks like:
>> {+extension:sql +filename:"3472_003_executa carga.sql"}
>>     [Lucene.Net.Search.BooleanQuery]: {+extension:sql
>> +filename:"3472_003_executa carga.sql"}
>>     boost: 1.0
>> note a space between "executa" and "carga".
>>
>> Why? Am I doing anything wrong?
>>
>
> QueryParser is using an analyzer itself while parsing the
> query, and this analyzer seems to be be stripping the "_"
> for some reasons. Which analyzer are you using?
>
> Robert
>
>
>

Mime
View raw message