Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1781 invoked from network); 12 May 2005 08:36:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 May 2005 08:36:01 -0000 Received: (qmail 14914 invoked by uid 500); 12 May 2005 08:39:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 14863 invoked by uid 500); 12 May 2005 08:39:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 14849 invoked by uid 99); 12 May 2005 08:39:39 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from 20.70-84-109.reverse.theplanet.com (HELO box4.just-hosting.com) (70.84.109.20) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 12 May 2005 01:39:39 -0700 Received: from mail.yardimci.gen.tr ([195.33.204.85] helo=[127.0.0.1]) by box4.just-hosting.com with esmtpa (Exim 4.43) id 1DW9AT-0004hs-0f for java-user@lucene.apache.org; Thu, 12 May 2005 08:35:42 +0000 Message-ID: <4283154E.1060404@axtelsoft.com> Date: Thu, 12 May 2005 11:35:26 +0300 From: Ahmet Aksoy Reply-To: ahmetax@axtelsoft.com Organization: Axtelsoft User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Top most frequent words References: <20050512075918.21256.qmail@web31106.mail.mud.yahoo.com> <428310F0.7090802@scalix.com> In-Reply-To: <428310F0.7090802@scalix.com> Content-Type: text/plain; charset=windows-1254; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - box4.just-hosting.com X-AntiAbuse: Original Domain - lucene.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - axtelsoft.com X-Source: X-Source-Args: X-Source-Dir: X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi John, I haven't investigated the sources yet, but you might be right. However, as you stated, those type of lists directly depend on the subject, and the source. Anyway, it is not very important for my study, and I'm sure it will help me very much. I will prepare optimized lists if I can obtain some different sets. Best regards. Ahmet John Haxby wrote: > Otis Gospodnetic wrote: > >> Somebody asked about this today, and I just found this through Simpy: >> http://www.unine.ch/info/clef/ >> >> Scroll half-way through the page, look on the right side: 1,000 most >> frequent words for several languages. >> >> > Hmm. I'm not sure how valuable that is. For English "los" and > "angeles" are ranked 99 and 101 respectively and "officials" comes in > at 125. Obviously I'm guessing, but those middle ranking words have > come from a slightly skewed source -- newspapers in a fixed interval > perhaps. (I don't think "Los Angeles" makes it into every day > parlance in the UK, and "officials" suggests that we're obsessed with > beauracracy :-)) > > jch > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > . > -- Ahmet Aksoy axtelsoft.com - armalink.com ahmetax.blogspot.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org