Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 288DD200CF1 for ; Mon, 28 Aug 2017 09:17:56 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2709E163951; Mon, 28 Aug 2017 07:17:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6DA1616394F for ; Mon, 28 Aug 2017 09:17:55 +0200 (CEST) Received: (qmail 18254 invoked by uid 500); 28 Aug 2017 07:17:53 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 18241 invoked by uid 99); 28 Aug 2017 07:17:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Aug 2017 07:17:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 59177C228E for ; Mon, 28 Aug 2017 07:17:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=mdpi.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id UyCrB6pslJmZ for ; Mon, 28 Aug 2017 07:17:50 +0000 (UTC) Received: from mail.mdpi.com (mail.mdpi.com [195.65.194.213]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 9B20C5FDB5 for ; Mon, 28 Aug 2017 07:17:50 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.mdpi.com (Postfix) with ESMTP id A14A714E03BB for ; Mon, 28 Aug 2017 09:17:43 +0200 (CEST) Received: from mail.mdpi.com ([127.0.0.1]) by localhost (mail.mdpi.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id cb02AQRWEw6i for ; Mon, 28 Aug 2017 09:17:43 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by mail.mdpi.com (Postfix) with ESMTP id 71C1014E03DC for ; Mon, 28 Aug 2017 09:17:43 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.mdpi.com 71C1014E03DC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mdpi.com; s=EB98F24C-DECB-11E5-856C-FD3CBF5F7692; t=1503904663; bh=eibt9yE6bzn3CNBGcXblVINZshQEyY4EUwmIjuEXfyA=; h=To:From:Message-ID:Date:MIME-Version; b=b2qkg+OA/KiF55vyxuwxN1CA2pvw8R8mp+6lBlNYQh0pJNV4cXdtxK3B8c/cXjUh4 PbRj/xcifVarqhTrCJGP/yyeiDZURR/1fHGeVEn8ZLNWeY9ToMUYx0j9iOnpG9m169 umFFBcaVGsIOHKTPgUcvhqfiAXLGTl4R+Ol6WYFk= X-Virus-Scanned: amavisd-new at mdpi.com Received: from mail.mdpi.com ([127.0.0.1]) by localhost (mail.mdpi.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id XZolicfgkGen for ; Mon, 28 Aug 2017 09:17:43 +0200 (CEST) Received: from [192.168.0.114] (unknown [217.169.210.162]) by mail.mdpi.com (Postfix) with ESMTPSA id 31AC414E03BB for ; Mon, 28 Aug 2017 09:17:43 +0200 (CEST) Subject: Re: Search by similarity? To: solr-user@lucene.apache.org References: From: Darko Todoric Message-ID: <17b63526-008b-d151-bdbe-84c085eb200e@mdpi.com> Date: Mon, 28 Aug 2017 09:17:42 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------FBC0B9892B165AB642BFA693" Content-Language: en-US archived-at: Mon, 28 Aug 2017 07:17:56 -0000 --------------FBC0B9892B165AB642BFA693 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hm... I cannot make that this DisMax work on my Solr... In solr I have document with title: - "title-1-end" - "title-2-end" - "title-3-end" - ... - ... - "title-312-end" and when I make query "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' I get all documents from solr :\ What I doing wrong? Also, I don't know if affecting results, but on "title" field I use "WhitespaceTokenizerFactory". Kind regards, Darko On 08/25/2017 06:38 PM, Junte Zhang wrote: > If you already have the title of the document, then you could run that title as a new query against the whole index and exclude the source document from the results as a filter. > > You could use the DisMax query parser: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser > > And then set the minimum match ratio of the OR clauses to 90%. > > /JZ > > -----Original Message----- > From: Darko Todoric [mailto:todoric@mdpi.com] > Sent: Friday, August 25, 2017 5:49 PM > To: solr-user@lucene.apache.org > Subject: Search by similarity? > > Hi, > > > I have 90.000.000 documents in Solr and I need to compare "title" of this document and get all documents with more than 80% similarity. PHP have "similar_text" but it's not so smart inserting 90m documents in the array... > Can I do some query in Solr which will give me the more the 80% similarity? > > > Kind regards, > Darko Todoric > > -- > Darko Todoric > Web Engineer, MDPI DOO > Veljka Dugosevica 54, 11060 Belgrade, Serbia > +381 65 43 90 620 > www.mdpi.com > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. > f you have received this message in error, please notify me and delete this message from your system. > You may not copy this message in its entirety or in part, or disclose its contents to anyone. > -- Darko Todoric Web Engineer, MDPI DOO Veljka Dugosevica 54, 11060 Belgrade, Serbia +381 65 43 90 620 www.mdpi.com Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. f you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. --------------FBC0B9892B165AB642BFA693--