Mailing-List: contact lucene-net-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: lucene-net-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of digydigy@gmail.com designates
 209.85.215.48 as permitted sender)
From: "Digy" <digydigy@gmail.com>
To: <lucene-net-user@lucene.apache.org>
References: 
 <CAH=GueTk=tUVcD4m3EtyWyk3i3f6oF68-CJA5r7LGqR+5_oUag@mail.gmail.com>
 <CAE8q=UZUgAMv9MGCfXZEVDSk-TDQpu_-xA4CN12NWofvYWyfLA@mail.gmail.com>
 <CAEm6kteoMJiMNUTHa69uXxciTFQ+NQhLVGfEG0ysLsn4jydJsw@mail.gmail.com>
 <003a01cc6cc5$c75957a0$560c06e0$@com>
 <CAEm6ktdDVOKwgNxe7zOiFZgg9uN=e2B4NhnDyjSRA5vWKMsoxQ@mail.gmail.com>
In-Reply-To: 
 <CAEm6ktdDVOKwgNxe7zOiFZgg9uN=e2B4NhnDyjSRA5vWKMsoxQ@mail.gmail.com>
Date: Tue, 6 Sep 2011 23:04:09 +0300
Message-ID: <004501cc6cd0$257f5140$707df3c0$@com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0046_01CC6CE9.4ACC8940"
thread-index: AcxsyC9RQ61ELT0tRgCoEhehWQtptAABuzUw
Content-Language: tr
Subject: RE: [Lucene.Net] How to index/search a file name

------=_NextPart_000_0046_01CC6CE9.4ACC8940
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

That can be a starting point (Just play a little bit with with =
tokenizers & filters )

=20

    public class ModifiedStandardAnalyzer : Analyzer

    {

        public override TokenStream TokenStream(System.String fieldName, =
System.IO.TextReader reader)

        {

            StandardTokenizer tokenStream =3D new =
StandardTokenizer(reader, true);

            TokenStream result =3D new StandardFilter(tokenStream);

            result =3D new LowerCaseFilter(result);

            result =3D new ASCIIFoldingFilter(result);

            return result;

        }

    }

=20

DIGY

=20

-----Original Message-----
From: Gustavo Poll [mailto:gkpoll@gmail.com]=20
Sent: Tuesday, September 06, 2011 10:06 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] How to index/search a file name

=20

thanks again... Ok, it is not..

=20

standard analyzer:

=20

[name.surname@gmail.com] [123.456] [3,5] [at&t] =
[g=C3=BCs=C4=B1=C3=B6=C3=A7] [g=C3=BCsi=C3=B6=C3=A7] [a=C3=9F?de?]

[??????] [ss=C3=9F]

=20

UnaccentedWordAnalyzer:

=20

[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]

[gusioc] [a=C3=9F?de?] [??????] [ssss]

=20

=20

StandardAnalyzer would be perfect to my application if it was accent

insensitive... Can anyone tell me please, the easiest way to code such

analyzer? (accent insensitive Standard Analyzer)

=20

I hear it is not a good idea to make a class that inherits =
StandardAnalyzer

cause StandardAnalyzer should be a final class.. Is this coherent?

=20

Appreciate any help please...

Gustavo Poll

=20

=20

=20

=20

2011/9/6 Digy <digydigy@gmail.com>

=20

> A function is worth a thousand words J

>=20

>=20

>=20

>=20

>=20

>        void Test()

>=20

>        {

>=20

>            Analyzer[] analyzers =3D new Analyzer[] { new =
StandardAnalyzer(),

> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };

>=20

>            string input =3D "Name.Surname@gmail.com 123.456 3,5 AT&T

> =
=C4=9F=C3=BC=C5=9F=C4=B1=C3=B6=C3=A7%=C4=9E=C3=9C=C5=9E=C4=B0=C3=96=C3=87=
$=CE=91=CE=92=CE=93=CE=94=CE=95=CE=96#=D0=90=D0=91=D0=92=D0=93=D0=94=D0=95=
 SS=C3=9F";

>=20

>=20

>=20

>            foreach (Analyzer analyzer in analyzers)

>=20

>            {

>=20

>                TokenStream ts =3D analyzer.TokenStream("", new

> StringReader(input));

>=20

>                Lucene.Net.Analysis.Token t =3D ts.Next();

>=20

>                while (t !=3D null)

>=20

>                {

>=20

>                    Console.Write("[" + t.TermText() + "] ");

>=20

>                    t =3D ts.Next();

>=20

>                }

>=20

>                Console.WriteLine(); Console.WriteLine();

>=20

>=20

>=20

>            }

>=20

>        }

>=20

>=20

>=20

> DIGY

>=20

>=20

>=20

>=20

>=20

> -----Original Message-----

> From: Gustavo Poll [mailto:gkpoll@gmail.com]

> Sent: Tuesday, September 06, 2011 9:00 PM

> To: lucene-net-user@lucene.apache.org

> Subject: Re: [Lucene.Net] How to index/search a file name

>=20

>=20

>=20

> thanks DIGY, I have interest in that too... Let me see if i =
understood:

>=20

>=20

>=20

> UnaccentedWordAnalyzer  is like Standard Analyzer, but accent =
insensitive?

>=20

>=20

>=20

> Thanks!

>=20

> Gustavo Poll

>=20

>=20

>=20

>=20

>=20

> 2011/9/6 digy digy <digydigy@gmail.com>

>=20

>=20

>=20

> > That may help

>=20

> >

>=20

> > UnaccentedWordAnalyzer @

>=20

> >

>=20

> >

> =
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/C=
ore/Analysis/Ext/Analysis.Ext.cs

>=20

> >

>=20

> >

>=20

> > DIGY

>=20

> >

>=20

> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <floyd.wu@gmail.com> =
wrote:

>=20

> >

>=20

> > > Hi everyone,

>=20

> > >

>=20

> > > I have a question that annoying me many times. my situation is =
that I

>=20

> > need

>=20

> > > to index file name and need to be searchable using partial file =
name.

>=20

> > >

>=20

> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)

>=20

> > >

>=20

> > > When I shot queries

>=20

> > >

>=20

> > > filename:ABCD    no match return.

>=20

> > >

>=20

> > > filename:2010Q2_ABCD     match

>=20

> > >

>=20

> > > filename:Report*    match

>=20

> > >

>=20

> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. =
Current

>=20

> > > filename

>=20

> > > field is set to tokenized/indexed/store

>=20

> > >

>=20

> > > What I want is when user type any part of file name that =
lucene.Net can

>=20

> > > match.

>=20

> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or =
Report.xls)

>=20

> > >

>=20

> > > Please help on this or kindly direct me a way to solve it.

>=20

> > >

>=20

> > > Floyd

>=20

> > >

>=20

> >

>=20

>=20

>=20

> -----

>=20

> Bu iletide vir=C3=BCs bulunamad=C4=B1.

>=20

> AVG taraf=C4=B1ndan kontrol edildi - www.avg.com

>=20

> S=C3=BCr=C3=BCm: 2012.0.1796 / Vir=C3=BCs Veritaban=C4=B1: 2082/4480 - =
S=C3=BCr=C3=BCm Tarihi: 06.09.2011

>=20

>=20

=20

-----

Bu iletide vir=C3=BCs bulunamad=C4=B1.

AVG taraf=C4=B1ndan kontrol edildi - www.avg.com

S=C3=BCr=C3=BCm: 2012.0.1796 / Vir=C3=BCs Veritaban=C4=B1: 2082/4480 - =
S=C3=BCr=C3=BCm Tarihi: 06.09.2011


------=_NextPart_000_0046_01CC6CE9.4ACC8940--