Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1456 invoked from network); 4 Sep 2006 04:22:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 4 Sep 2006 04:22:54 -0000 Received: (qmail 48098 invoked by uid 500); 4 Sep 2006 04:22:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 47070 invoked by uid 500); 4 Sep 2006 04:22:49 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 47058 invoked by uid 99); 4 Sep 2006 04:22:49 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Sep 2006 21:22:49 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of davekor@gmail.com designates 66.249.82.233 as permitted sender) Received: from [66.249.82.233] (HELO wx-out-0506.google.com) (66.249.82.233) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Sep 2006 21:22:47 -0700 Received: by wx-out-0506.google.com with SMTP id s15so1775257wxc for ; Sun, 03 Sep 2006 21:22:27 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=CpWigqWfgGbAIeXNOdQ24sGO9pzTtHAx1Z7rgbxc/en2QEwBp4nx9hc7tfqy/eAzxZMM4OD9IsIA33uoxTdiFm0tSnDMPEGtfGrwqxjdlTvXufNsPRlJ+CbaeUSuLWf84YWvTMyj2YpMHb4rqrKogzZ3dv5ZEPd5xDVdYhSc6/k= Received: by 10.90.118.12 with SMTP id q12mr1056978agc; Sun, 03 Sep 2006 21:22:26 -0700 (PDT) Received: by 10.90.72.20 with HTTP; Sun, 3 Sep 2006 21:22:26 -0700 (PDT) Message-ID: <901c03930609032122v3388a281t50cca93d88562755@mail.gmail.com> Date: Mon, 4 Sep 2006 12:22:26 +0800 From: "Dave Kor" To: java-user@lucene.apache.org Subject: Re: word frequency list? In-Reply-To: <44F5D548.4010401@mindspring.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44F5D548.4010401@mindspring.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N There is the Berkeley Web Term Frequency database which contains over 30 million unique terms extracted from 50 million webpages. http://elib.cs.berkeley.edu/docfreq/index.html On 8/31/06, Jason Pump wrote: > Is there a large list of words and their frequency in the english > language? Obviously it would differ by corpus but I would like to see > what's already available. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Dave Kor, PhD Candidate Center for Information Mining and Extraction School of Computing National University of Singapore. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org