Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 89714 invoked from network); 26 Dec 2007 21:42:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Dec 2007 21:42:28 -0000 Received: (qmail 2924 invoked by uid 500); 26 Dec 2007 21:42:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 2898 invoked by uid 500); 26 Dec 2007 21:42:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 2887 invoked by uid 99); 26 Dec 2007 21:42:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Dec 2007 13:42:12 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cdoronc@gmail.com designates 72.14.220.158 as permitted sender) Received: from [72.14.220.158] (HELO fg-out-1718.google.com) (72.14.220.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Dec 2007 21:41:50 +0000 Received: by fg-out-1718.google.com with SMTP id d23so1580932fga.27 for ; Wed, 26 Dec 2007 13:41:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=Xcw+Ws1tkwFx0P2gIZ9B3klPoBHZNTaHTNnlW2AtAZI=; b=gfEA1AvcmLsDbvarxgXs6FweC9ppw46XqD+9AwF5T7NXdx29roe+9lbtO5l0DaBYapHdQeWKDPUTFLwHPvuzcduq4P+eYRnwfV2kUlwka9l26sC4sCjXTLtAD5y+FtobbSCpbMQkICdPz9NJPwsszYWSb0OnKyZCguIvTLEBg+k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=aqk1xY7EprfKSjwgToN7stte+jAEYLSk53RugMLoYuPGqQag/BO5foNF8QoH0exgVV4wOqqaYZeCuB10iCxhU4M71o3Nf3YWemLsfVYBzeunwikKlNWcIzfLSkkJER6UvnWETS/kvL2CUerU2kVT4xq6rV09Nf+LNBdQSK9VX88= Received: by 10.86.90.2 with SMTP id n2mr7073913fgb.66.1198705313223; Wed, 26 Dec 2007 13:41:53 -0800 (PST) Received: by 10.86.50.1 with HTTP; Wed, 26 Dec 2007 13:41:53 -0800 (PST) Message-ID: Date: Wed, 26 Dec 2007 23:41:53 +0200 From: "Doron Cohen" To: java-user@lucene.apache.org Subject: Re: StopWords problem In-Reply-To: <4772BA98.5020403@gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_10223_27628161.1198705313210" References: <4772AD46.9050709@gmail.com> <4772B174.9060404@gmail.com> <4772BA98.5020403@gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_10223_27628161.1198705313210 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Dec 26, 2007 10:33 PM, Liaqat Ali wrote: > Using javac -encoding UTF-8 still raises the following error. > > urduIndexer.java : illegal character: \65279 > ? > ^ > 1 error > > What I am doing wrong? > If you have the stop-words in a file, say one word in a line, they can be read like this: BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("Urdu.txt"),"UTF8")); String word = r.readLine(); // loop this line, you get the picture (Make sure to specify encoding "UTF8" when saving the file from notepad). Regards, Doron ------=_Part_10223_27628161.1198705313210--