Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57FBF101A6 for ; Tue, 10 Sep 2013 06:12:00 +0000 (UTC) Received: (qmail 5195 invoked by uid 500); 10 Sep 2013 06:11:56 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 4732 invoked by uid 500); 10 Sep 2013 06:11:50 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 4724 invoked by uid 99); 10 Sep 2013 06:11:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Sep 2013 06:11:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arafalov@gmail.com designates 209.85.219.49 as permitted sender) Received: from [209.85.219.49] (HELO mail-oa0-f49.google.com) (209.85.219.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Sep 2013 06:11:43 +0000 Received: by mail-oa0-f49.google.com with SMTP id i7so7426029oag.8 for ; Mon, 09 Sep 2013 23:11:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=hyIlwMoTbi3OBBDBIsVf7R2yqK5FDzTvrXELXsHILAk=; b=ndVygbPAGaFKbLqF2tgTMdF0lHEgqcyvNwzfcHjgVeoOG7K+5uA4if3ch+TK+vfjLd ZejlTIZcTJcSKQf/3mD/vyKcLXpcuHivUkECwjUbvho0l48jAG8vvByJI+DPOoTJjZMg cymMKuqC35ZIKb+N12fFcXKPDZ+3asOYtsdLT06T2Ws7vNccGG1+x81Y5g2Z59wfWD3T cyFAW2Szp+xN/M9xy0iO19IQirUKsAqmS39AFFWVTEM4jgJDcYf5+wNCwSvJAKs7eDAk qWFqKaZso8INyRoRVIhh6ZhHCX6kKPBu8nE7Hgkk/ZTFlJaDHQmYvXvPd+Kj4sD9I24f ndww== X-Received: by 10.60.102.237 with SMTP id fr13mr14312501oeb.20.1378793482087; Mon, 09 Sep 2013 23:11:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.38.33 with HTTP; Mon, 9 Sep 2013 23:10:42 -0700 (PDT) In-Reply-To: References: From: Alexandre Rafalovitch Date: Tue, 10 Sep 2013 13:10:42 +0700 Message-ID: Subject: Re: find all two word phrases that appear in more than one document To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e01183066b7f4dc04e6015f86 X-Virus-Checked: Checked by ClamAV on apache.org --089e01183066b7f4dc04e6015f86 Content-Type: text/plain; charset=UTF-8 I believe one of the admin pages (Solr 4+) shows all the terms and frequencies. You can use that even with stock example. Try that. If that makes sense, you can explore further. As to other examples, there is a couple of books. I bet Jack's book covers this. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Sep 10, 2013 at 12:09 PM, Ali, Saqib wrote: > Thanks Alexandre. I looked at the wiki page for the TermsComponent. But I > am not sure if I follow. Do you have an example or some better document? > Thanks! :) > > > On Mon, Sep 9, 2013 at 8:17 PM, Alexandre Rafalovitch >wrote: > > > The "phases" are usually called n-grams or shingles. > > > > You can probably use ShingleFilterFactory to create your shingles > (possibly > > with outputUnigrams=false) and then use TermsComponent ( > > http://wiki.apache.org/solr/TermsComponent) to list the results. > > > > Regards, > > Alex. > > > > Personal website: http://www.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Tue, Sep 10, 2013 at 8:22 AM, Ali, Saqib > wrote: > > > > > Dear Solr Ninjas, > > > > > > We would like to run a query that returns two word phrases that appear > in > > > more than one document. So for e.g. take the string "Solr Ninja". Since > > it > > > appears in more than one document in our Solr instance, the query > should > > > return that. The query should find all such phrases from all the > > documents > > > in our Solr instance, by querying for two adjacent word combination > > > (forming a phrase) in the documents that are in the Solr. These two > > > adjacent word combinations should come from the documents in the Solr > > > index. > > > > > > Any ideas on how to write this query? > > > > > > Thanks. > > > > > > --089e01183066b7f4dc04e6015f86--