Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 86E4FDF6B for ; Sun, 17 Mar 2013 17:27:55 +0000 (UTC) Received: (qmail 42921 invoked by uid 500); 17 Mar 2013 17:27:52 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 42870 invoked by uid 500); 17 Mar 2013 17:27:51 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 42862 invoked by uid 99); 17 Mar 2013 17:27:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Mar 2013 17:27:51 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sarowe@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Mar 2013 17:27:44 +0000 Received: by mail-ie0-f170.google.com with SMTP id c11so6202407ieb.15 for ; Sun, 17 Mar 2013 10:27:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=0uWaDZckuLgmi9hIyNRgoMvHxDWjFwSJycPkwxWu0ps=; b=lQo2TWqFK2GtnevckUFNTOOHKOG34M3SC14NbR7FvFWi1NkwIk3VsYvFjUCmVwZ6Il kNAaMZAiEwbPv2sQgw3MnjYAvBKQwbAhV+YHC8J7GDFff9HwajdcO/kVPQbf6ph8HpKt izygYm9nuZp7r1ILvli8NURqccjEcIf2xvjKF3uGZGuQr7eFC6n2MAKqJRnwkdvO5pyM MA8YwPTylOpjaX21O2ljh0fUrU/99672nBo7pjo+PHtLZr+V2OCOOT9M2fyPrEXvtveN zGhOChtUa7g7vwywZtvC84WxyAJAf0cmLy2qtlCwvhHObklpIDCp752G2VAdbmZvcNI1 7hjg== X-Received: by 10.50.53.143 with SMTP id b15mr4970660igp.69.1363541243486; Sun, 17 Mar 2013 10:27:23 -0700 (PDT) Received: from [192.168.1.204] (cpe-67-249-104-72.twcny.res.rr.com. [67.249.104.72]) by mx.google.com with ESMTPS id px9sm6486477igc.0.2013.03.17.10.27.21 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 17 Mar 2013 10:27:22 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Is there an EdgeSingleFilter already? From: Steve Rowe In-Reply-To: Date: Sun, 17 Mar 2013 13:27:19 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: solr-user@lucene.apache.org X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org Hi xavier, Cool, thanks for the feedback, I'll commit later today (unless somebody = objects), so it will be part of the Lucene/Solr 4.3 release. Steve On Mar 17, 2013, at 1:21 PM, xavier jmlucjav wrote: > Steve, worked like a charm. > thanks! >=20 >=20 > On Sun, Mar 17, 2013 at 7:37 AM, Steve Rowe wrote: >=20 >> See https://issues.apache.org/jira/browse/LUCENE-4843 >>=20 >> Let me know if it works for you. >>=20 >> Steve >>=20 >> On Mar 16, 2013, at 5:35 PM, xavier jmlucjav = wrote: >>=20 >>> I read too fast your reply, so I thought you meant configuring the >>> LimitTokenPositionFilter. I see you mean I have to write one, ok... >>>=20 >>>=20 >>>=20 >>> On Sat, Mar 16, 2013 at 10:33 PM, xavier jmlucjav = >> wrote: >>>=20 >>>> Steve, >>>>=20 >>>> Yes, I want only "one", "one two", and "one two three", but nothing >> else. >>>> Cool if this can be achieved without java code even better, I'll = check >> that >>>> filter. >>>>=20 >>>> I need this for building a field used for suggestions, the user >>>> specifically wants no match only from the edge. >>>>=20 >>>> thanks! >>>>=20 >>>> On Sat, Mar 16, 2013 at 10:22 PM, Steve Rowe = wrote: >>>>=20 >>>>> Hi xavier, >>>>>=20 >>>>> It's not clear to me what you want. Is the "edge" you're = referring to >>>>> the beginning of a field? E.g. raw text "one two three four" with >>>>> EdgeShingleFilter configured to produce unigrams, bigrams and = trigams >> would >>>>> produce "one", "one two", and "one two three", but nothing else? >>>>>=20 >>>>> If so, I suspect writing a LimitTokenPositionFilter (which would = stop >>>>> emitting tokens after the token position exceeds a specified = limit) >> would >>>>> be better, rather than subclassing ShingleFilter. You could use >>>>> LimitTokenCountFilter as a model, especially its = "comsumeAllTokens" >> option. >>>>> I think this would make a nice addition to Lucene. >>>>>=20 >>>>> Also, what do you plan to use this for? >>>>>=20 >>>>> Steve >>>>>=20 >>>>> On Mar 16, 2013, at 5:02 PM, xavier jmlucjav >> wrote: >>>>>> Hi, >>>>>>=20 >>>>>> I need to use shingles but only keep the ones that start from the >> edge. >>>>>>=20 >>>>>> I want to confirm there is no way to get this feature without >>>>> subclassing >>>>>> ShingleFilter, cause I thought someone would have already = encountered >>>>> this >>>>>> use case.... >>>>>>=20 >>>>>> thanks >>>>>> xavier >>>>>=20 >>>>>=20 >>>>=20 >>=20 >>=20