Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 23862 invoked from network); 9 Aug 2007 07:35:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Aug 2007 07:35:30 -0000 Received: (qmail 39365 invoked by uid 500); 9 Aug 2007 07:35:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 39324 invoked by uid 500); 9 Aug 2007 07:35:22 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 39312 invoked by uid 99); 9 Aug 2007 07:35:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2007 00:35:22 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of akanksha.baid@gmail.com designates 209.85.128.186 as permitted sender) Received: from [209.85.128.186] (HELO fk-out-0910.google.com) (209.85.128.186) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2007 07:35:18 +0000 Received: by fk-out-0910.google.com with SMTP id z23so341281fkz for ; Thu, 09 Aug 2007 00:34:56 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; b=MBfcHLqbP+Q+16bALJY+n658ZvTWmJW/cxC/1C3L9OUNR3kD9mVG9l7keo7GaC6y+TQEwS+LETboj0nHm58IGGtA6RnZjMoI9pDofVuG5FRTMa3DDkEMCMN9UW24POqi3Ar3xrAfdJ8IP5OC+yZQzdPnJB3KcRDJwHkRdVhWwlI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=nS4dPxBilnwvMRBouV1zYrujsLa0WWhR8IrSU05ToJll3jCBFeRqxKhJ6ZREtfkWXX/WUdfK2qU+yNe+LFK7V7p3Io4kzRma1Z0LXygXbR2Se5hNc8Z+XwgOxGYShuu65yoU1O95zvAiCtSkTKMPYj7vVPoaxi0mEnyVN18YM00= Received: by 10.82.158.12 with SMTP id g12mr2747469bue.1186644895721; Thu, 09 Aug 2007 00:34:55 -0700 (PDT) Received: by 10.82.157.20 with HTTP; Thu, 9 Aug 2007 00:34:55 -0700 (PDT) Message-ID: <2b98f9010708090034m1bf38ba9u9951e82414d1388c@mail.gmail.com> Date: Thu, 9 Aug 2007 00:34:55 -0700 From: "Akanksha Baid" To: java-user@lucene.apache.org Subject: frequent phrases MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2793_19322886.1186644895691" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_2793_19322886.1186644895691 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline I was wondering if there is a "search based" method to find the top-k frequent phrases in a set of documents.( I do not have a particular phrase in mind so PhraseQuery can probably be ruled out). I have implemented something that works using termvectors and termpositions but the performance is not great so far since I am basically iterating multiple times and hacking my way around. I was wondering if an API exists for finding frequent phrases and/or if someone could point me to some code for the same. Thanks. ------=_Part_2793_19322886.1186644895691--