From solr-user-return-142708-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Wed Jul 25 21:31:59 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 19B1A18062C for ; Wed, 25 Jul 2018 21:31:58 +0200 (CEST) Received: (qmail 87558 invoked by uid 500); 25 Jul 2018 19:31:57 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87546 invoked by uid 99); 25 Jul 2018 19:31:56 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jul 2018 19:31:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 2C5A9180582 for ; Wed, 25 Jul 2018 19:31:56 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 5.889 X-Spam-Level: ***** X-Spam-Status: No, score=5.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, URIBL_BLACK=4] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id QzclDH9q0wVt for ; Wed, 25 Jul 2018 19:31:55 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A13795F18F for ; Wed, 25 Jul 2018 19:31:54 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id o18-v6so7139699wmc.0 for ; Wed, 25 Jul 2018 12:31:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=lvkm6A7/G6gEPDnY9n2rdQQ1IpxKyg0vqQjS7up9T1I=; b=m0GrHVFfJ56FCt/NLqsMvge1PNtZWo1czJsVL5+AGf2wBp7lu1uqRcz7jJ9nJVyi+F /mjw++remfcfopkp7Frt6XIxvbRmILUcw40TYlhmn+PNWJ0cY+Qp4mbhTNyjMNma6qmR baC7ItYjhbOUDqzA7j4+QxFM4mtrgb1zt4rTKO+ZFR63sE0Rs+j6voJhw29VvKb8NwI6 7Nc2fbUO04KkUtCs7WMWzNiD7BAr7dZruGcsCmsKuqNMpkRfV7SLZg2H8RPcMHdFea1p B4IDeZwIFU8i/6AOx7nhIIR6RHItIvgy2O2z6ThVzxOP8LBRszRacnIRa0SxR9+P2T3k f/bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=lvkm6A7/G6gEPDnY9n2rdQQ1IpxKyg0vqQjS7up9T1I=; b=iT9WTeNRlqrwCN+1v9Hhy82FH3+rPqRciOlioXjIwZNBhbJ6GeBSTRYMmXUcbb1TOt xnkBtKw4fSDUn0TX8CRmfVBkSjrpAgeWGs5pbcwOvOmVrRdMkIV6w1ormvrxpBlnge9e LW5GxhIjv6Dlstq4ipFcEd04VzjaOR71/C5C7svJ3n8Yrl04qdHGgKUvv6m7colaK2yi Aoi3t/UNYcA9AxuUKxEO8hQu+Q/9dLveSOukdX5Iayrl5Hp2IWwhHknDRqO/hQzX65iA IAq6AwsuTeQ1VtluHuha/qtVZqgrDplVJDX8YOmzPGT/mNdeTZxHj0ECW+PPBU+pKtS3 abcw== X-Gm-Message-State: AOUpUlFnCGPjz7q4GjEeV5wXLpTQGC12ozv0CDnY2oW4DorlU7gTdHFn f2l/SPul/43r/YTVUejPp4feMkPVJ+NLdNK4QGkdimB8 X-Google-Smtp-Source: AAOMgpeLbu8qzuZlB4WPoKj5tFvBQlDc7yxNN9QZOIVZfdz2LDnnhCwCYPZPyvHH53CwPgx6Yito8n2FCpKtVuc/G+4= X-Received: by 2002:a1c:ee15:: with SMTP id m21-v6mr5602164wmh.112.1532547113448; Wed, 25 Jul 2018 12:31:53 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a1c:9a0f:0:0:0:0:0 with HTTP; Wed, 25 Jul 2018 12:31:53 -0700 (PDT) In-Reply-To: References: From: David Hastings Date: Wed, 25 Jul 2018 15:31:53 -0400 Message-ID: Subject: Re: Section symbol, ignore in some queries but not others? To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary="0000000000004c77ae0571d7ee3e" --0000000000004c77ae0571d7ee3e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Ah, so I could index the text including the =C2=A7 character as an alpha, u= se no qs value when trying to ignore it, and for users add i a qs value assuming I use edismax, whic I currently am. Tested this method and it works as expected. Thanks, saved me a lot of time! -David On Wed, Jul 25, 2018 at 3:15 PM, Alexandre Rafalovitch wrote: > If you copyField and don't store the copy, then it is only the indexed > (term) representation for the copy that is much smaller. Just a > thought. > > The other thing is that you seem to be saying that you want to do a > match phrase but with a token gap, right? Like an eDisMax slop? > http://lucene.apache.org/solr/guide/7_4/the-extended-dismax- > query-parser.html > > Regards, > Alex. > > On 25 July 2018 at 14:47, David Hastings > wrote: > > Hey all. have a situation that seems pretty rough. currently in our > data > > we have a lot of sentences like this: > > > > elements comprise the "stuff" of the tax. 3 Reg. =C2=A7 1.901-2(a)(2). = 4 Only > > non-Saudis are subject to the > > 223%20Regulation%201%22%20OR%20%223%20Regulation%201%22% > 20OR%20%223%20Reg.%201%22)%20AND%20NOT%20id:hein. > journals/rcatorbg3.14))&div=3D13&handle=3Dhein.journals/ > taxlr53&collection=3Djournals> > > By default the word delimiter is treating all punctuation as a space. = So > > when you search for: > > 3 Reg. 1, your results can include 3 Reg. =C2=A7 1.901 > > > > I Have experimented with the WDF and added =C2=A7 =3D> ALPHA and this w= orks, and > > treats the character as a letter. however during some queries, I still > > need searches such as > > > > Servitudes 2.10 > > > > to return results with: > > > > > > Servitudes =C2=A7 2.10 > > > > > > I at the moment can not conceive of a way to to this aside from two > > separate text fields, and effectively doubling the size of my index. > > which currently sits at 300 gb optimized, and 500gb if left to its > > own. > > > > > > Thanks for any help or suggestions > --0000000000004c77ae0571d7ee3e--