Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A4C478D3 for ; Mon, 5 Dec 2011 12:29:01 +0000 (UTC) Received: (qmail 86009 invoked by uid 500); 5 Dec 2011 12:28:57 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 85955 invoked by uid 500); 5 Dec 2011 12:28:57 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 85947 invoked by uid 99); 5 Dec 2011 12:28:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 12:28:57 +0000 X-ASF-Spam-Status: No, hits=4.0 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tomasflobbe@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 12:28:51 +0000 Received: by vbnl22 with SMTP id l22so1637750vbn.35 for ; Mon, 05 Dec 2011 04:28:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=XB8tzrjUyr0CsHS+BHPlW+5Wu61BzfJQ9vnwQ94tOY4=; b=gKRzG81qd36LWg28r8M7OYATlroM3vlQwdBCwJgAj1SUXS/GlWRdwY+I3eBDpqX9m9 u0OW8hXVznF4Fw1UTFVjumctcJrpNWSJS6kQkoK50w5c8jDG1PoDU3N9pvAdoB4H2KiS INfhScYlrhZdTAbF1sTqovgjg5xfRVdj6p/eU= MIME-Version: 1.0 Received: by 10.52.77.69 with SMTP id q5mr4898359vdw.11.1323088107387; Mon, 05 Dec 2011 04:28:27 -0800 (PST) Received: by 10.220.227.7 with HTTP; Mon, 5 Dec 2011 04:28:27 -0800 (PST) In-Reply-To: References: Date: Mon, 5 Dec 2011 09:28:27 -0300 Message-ID: Subject: Re: Preventing empty strings in index From: =?ISO-8859-1?Q?Tom=E1s_Fern=E1ndez_L=F6bbe?= To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=20cf307f3834a5ba5b04b3577467 --20cf307f3834a5ba5b04b3577467 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable You could try adding a http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilt= erFactory Regards, Tom=E1s On Mon, Dec 5, 2011 at 6:01 AM, Marian Steinbach wrote: > Hi! > > I am surprised to find an empty string as the most frequent index term in > one of my fields. Until now I didn't even know that empty strings would b= e > indexed. > > Here is the schema.xml excerpt for that field: > > > > > replacement=3D"" /> > > ignoreCase=3D"true" /> > words=3D"stopwords_terms.txt" /> > > > > multiValued=3D"true"/> > > > I have the suspicion that PatternReplaceFilterFactory > with pattern=3D"^[0-9]+$" is causing the empty strings. I introduced that > filter to prevent numbers-only strings from being added to the index. > > Any hint on how I can get rid of numbers AND empty strings? > > Thanks! > > Marian > --20cf307f3834a5ba5b04b3577467--