From java-user-return-64028-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Wed Sep 19 15:37:41 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id D4737180621 for ; Wed, 19 Sep 2018 15:37:40 +0200 (CEST) Received: (qmail 49170 invoked by uid 500); 19 Sep 2018 13:37:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 49153 invoked by uid 99); 19 Sep 2018 13:37:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Sep 2018 13:37:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 879001A04BC for ; Wed, 19 Sep 2018 13:37:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id U2oxM00jXU5a for ; Wed, 19 Sep 2018 13:37:35 +0000 (UTC) Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id C49655FB81 for ; Wed, 19 Sep 2018 13:37:34 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id w14-v6so2698198plp.6 for ; Wed, 19 Sep 2018 06:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=mlo1+oiFgXJFM08BgrdRkhDuUplyf4fzDMCtWO1YITI=; b=WTMjOQykZeaVv4oHJjmM/Bs0YSxBCG4DHDsxs2ue8I2duR04AfAZ/F+JjVSuoVGpB0 7XJJAfhAgr9y1vJqmOcrm3paEPY+hKuOLuoMMWPOnUTDe0QpYGvHxoESxBL4xCRN8xOw 8+jMU0TLK6DRhqs0jLf1XTY3rRxcpOIvPTsFMsUzMjr/vZv7LKqMIPeIAiQWJksgByqt GKbVCoKmVz0EuGWr6WnPIGS2dpwE1cHQv448ldfAR6hpSCn5MmqVuLCfK2r6U/X/epCp GBynABK9vB2wPPyvExI3CjolAPT0IuEABzS4R/Gz3PXFFn01ruMF03JQjLwBOr+OwCui YoyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=mlo1+oiFgXJFM08BgrdRkhDuUplyf4fzDMCtWO1YITI=; b=tiqoDzLamy5biFO7tspLexIHpNFTa7dVl6llNMRV0I3Im1vdvc/l0w0JHFDKC1tpOk v7rCgBsKoiJeMIr0lg6F4H4nJfO3f24Cab6bBkZZXyw/8aHTDOeWbvo9BbfhPUAWfa+8 iU9l4Wl29m0qknaXxgDKSIWVhWFDQJdj8qsXHNeCT1Y/RPzvgdfs7aqEo/+lA2w7ow0S mBiUFKqeFMdOWsQ3D/TAMWMbG/rzzcugp8tVq/CZOneeorQ7g99pPOYzaxQ3nhzGLITQ rjwRYNt9azcli4hwJJYQu0/nYA/JRX+7C7QsqrZ4IdjtCIRJfbtfqv4NyNcEpL/Cmtsz QiNQ== X-Gm-Message-State: APzg51A0JM+i4m4vcnZZhmnL66KuWFPtS/73MLvrM6JTatetQwuaIPum 7fSDQ97m3DRg4mWHFC6YoA7HW/x7SflBIUW4vjN99Q== X-Google-Smtp-Source: ANB0Vdbbh9XmaVBFcV3rtqvIzEB10rAEvrGQRuRW7thcgoatFfVWfx1Ya79srFC/rJGkpItFPT21pDxBOcEpwU8J30c= X-Received: by 2002:a17:902:9045:: with SMTP id w5-v6mr34928337plz.10.1537364253526; Wed, 19 Sep 2018 06:37:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Adrien Grand Date: Wed, 19 Sep 2018 15:37:21 +0200 Message-ID: Subject: Re: regarding comparing texts using Lucene To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary="00000000000038d05105763982ff" --00000000000038d05105763982ff Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Veda, Lucene doesn't provide such functionality out of the box, but you could use MoreLikeThis ( https://lucene.apache.org/core/7_4_0/queries/org/apache/lucene/queries/mlt/= MoreLikeThis.html) to search for similar documents and then compute a finer-grained similarity score on client-side. This would avoid having to compute a similarity score with every document of your collection. Le mer. 19 sept. 2018 =C3=A0 15:28, Veda G M a =C3=A9cri= t : > Hello, > > Is it possible to compare large chunks of text and get the similarity > score/percentage using Lucene? > > Say for e.g., we have 2-3 paragraphs of text and need to search if there = is > any document that matches this semantically and the similarity that the > returned hit and the search string share in terms of percentage. > > Could you please let me know if this is possible with Lucene? > > Thanks. > > Regards, > Veda > --00000000000038d05105763982ff--