Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B8BB1883A for ; Fri, 11 Sep 2015 04:11:33 +0000 (UTC) Received: (qmail 22457 invoked by uid 500); 11 Sep 2015 04:08:28 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 22282 invoked by uid 500); 11 Sep 2015 04:08:27 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 21978 invoked by uid 99); 11 Sep 2015 04:08:27 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Sep 2015 04:08:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 43145C0BAD for ; Fri, 11 Sep 2015 04:08:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.1 X-Spam-Level: X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xMrNK5zBV7_v for ; Fri, 11 Sep 2015 04:08:14 +0000 (UTC) Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id B098A2318A for ; Fri, 11 Sep 2015 04:08:03 +0000 (UTC) Received: by iofb144 with SMTP id b144so85391240iof.1 for ; Thu, 10 Sep 2015 21:08:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=aejJLNFft0hB6/BJEpYIzbV9/9yFoYdkCRT+6WUnHfw=; b=jqwjlci1mHMZiANe88V+tHZ9a6Dp1/PKph9EPfxiLth/+GWX8tr9dzNtYJK7DbsvGD WJRNkCCmEds3OIXgPcJ0EbeYQwlsYyzpI+x6Fz6VYyzt430LQSCvTIvQueaFCE8zDoD2 QYKXzrJ+rJDAlpj+rI9UYJFHIWiZmNtn2hQfwQH3PurbgwUou3oTYE1g9OMJKkY4uoHg YAF9XAnslr4/cKum0Ny3QZFkwsRrcubQvoWYIZt1sfnzvEGexc2RvADeo30j4hgI/c7c xi1c8qwuanUJanTbcR8ffY5raHjWY1NpMXiQgDIJCBOuuZ9yLdpaCXB+WoGFezRddGQO fUEA== MIME-Version: 1.0 X-Received: by 10.107.16.158 with SMTP id 30mr223759ioq.50.1441944483173; Thu, 10 Sep 2015 21:08:03 -0700 (PDT) Received: by 10.107.9.93 with HTTP; Thu, 10 Sep 2015 21:08:03 -0700 (PDT) In-Reply-To: References: Date: Thu, 10 Sep 2015 21:08:03 -0700 Message-ID: Subject: Re: Detect term occurrences From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable _Assuming_ this isn't a high throughput _and_ the leaflet text isn't too bi= g... Index the thesaurus and fire all the terms of the query in a big OR clause against the index as a _query_. Perhaps turn highlighting on and highlight the entire leaflet text. Note, this is just "off the top of my head", I really haven't thought it through too far and a lot depends on how many leaflets you have to process and how often.... Best, Erick On Thu, Sep 10, 2015 at 7:21 PM, Francisco Andr=C3=A9s Fern=C3=A1ndez wrote: > Yes. > I have many drug products leaflets, each corresponding to 1 product. In t= he > other hand we have a medical dictionary with about 10^5 terms. > I want to detect all the occurrences of those terms for any leaflet > document. > Could you give me a clue about how is the best way to perform it? > Perhaps, the best way is (as Walter suggests) to do all the queries every > time, as needed. > Regards, > > Francisco > > El jue., 10 de sept. de 2015 a la(s) 11:14 a. m., Alexandre Rafalovitch < > arafalov@gmail.com> escribi=C3=B3: > >> Can you tell us a bit more about the business case? Not the current >> technical one. Because it is entirely possible Solr can solve the >> higher level problem out of the box without you doing manual term >> comparisons.In which case, your problem scope is not quite right. >> >> Regards, >> Alex. >> ---- >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: >> http://www.solr-start.com/ >> >> >> On 10 September 2015 at 09:58, Francisco Andr=C3=A9s Fern=C3=A1ndez >> wrote: >> > Hi all, I'm new to Solr. >> > I want to detect all ocurrences of terms existing in a thesaurus into = 1 >> or >> > more documents. >> > What=C2=B4s the best strategy to make it? >> > Doing a query for each term doesn't seem to be the best way. >> > Many thanks, >> > >> > Francisco >>