Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 25692 invoked from network); 14 Apr 2008 21:41:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Apr 2008 21:41:49 -0000 Received: (qmail 3136 invoked by uid 500); 14 Apr 2008 21:41:48 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 3092 invoked by uid 500); 14 Apr 2008 21:41:48 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 3070 invoked by uid 99); 14 Apr 2008 21:41:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2008 14:41:48 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.69.42.181] (HELO radix.cryptio.net) (208.69.42.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2008 21:41:05 +0000 Received: by radix.cryptio.net (Postfix, from userid 1007) id 5271171C263; Mon, 14 Apr 2008 14:41:18 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by radix.cryptio.net (Postfix) with ESMTP id 4E99371C25C for ; Mon, 14 Apr 2008 14:41:18 -0700 (PDT) Date: Mon, 14 Apr 2008 14:41:18 -0700 (PDT) From: Chris Hostetter To: solr-user@lucene.apache.org Subject: Re: Searching for popular phrases or words In-Reply-To: <557641.68042.qm@web57712.mail.re3.yahoo.com> Message-ID: References: <557641.68042.qm@web57712.mail.re3.yahoo.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org it depends on your definition of "polular" if you mean "occurs in a lot of documents" then take a look at the LukeRequestHandler ... if can give you info on terms with high frequencies (and you can use a Shingles based tokenizer to index "phrase" as terms if by popular you mean "occurs in a lot of queries" there isn't anything in Solr that keeps track of what people search for ... your application would need to do that. : How can i search for popular phrases or words with an : option to include only, for example, technical terms : e.g "Oracle database" rather than common english You'll need a better definition of your goal to get any meaningful answer to the "an option to include only, for example, technical terms" part of that question ... the "for example" implies that there are other examples ... how would you (as a human person) decide when to classify a phrase as a "technical" phrase, vs an ... "other" phrase? if you can't answer that question, then neither can code. -Hoss