Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 033437982 for ; Mon, 5 Dec 2011 09:02:34 +0000 (UTC) Received: (qmail 88104 invoked by uid 500); 5 Dec 2011 09:02:31 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 87954 invoked by uid 500); 5 Dec 2011 09:02:31 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87939 invoked by uid 99); 5 Dec 2011 09:02:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 09:02:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marian.steinbach@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2011 09:02:22 +0000 Received: by qcsc2 with SMTP id c2so1599236qcs.35 for ; Mon, 05 Dec 2011 01:02:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; bh=fCCWW4uCQX5XgvvF0Dozbs4cJ9ogFapyTLmLY7/yd/A=; b=HKPWOU38XywTe03RIkYoWSJRJBYjr+RYdoYzlgeKTCoypGrU7hJBCdpKbtOrKMAwdK zwZo0RuGGWDFU0bRuaEpsYimrstbP/1Em8tMBjeTPs0x1pRHqlA7P95LWJ/ZkxnT6pTY 5BkZAO0IvflXajyolAdwcOKTLVejJIpwtkPOo= Received: by 10.229.217.201 with SMTP id hn9mr1794505qcb.0.1323075722130; Mon, 05 Dec 2011 01:02:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.229.82.85 with HTTP; Mon, 5 Dec 2011 01:01:40 -0800 (PST) From: Marian Steinbach Date: Mon, 5 Dec 2011 10:01:40 +0100 Message-ID: Subject: Preventing empty strings in index To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016363100076db46704b3549282 --0016363100076db46704b3549282 Content-Type: text/plain; charset=ISO-8859-1 Hi! I am surprised to find an empty string as the most frequent index term in one of my fields. Until now I didn't even know that empty strings would be indexed. Here is the schema.xml excerpt for that field: I have the suspicion that PatternReplaceFilterFactory with pattern="^[0-9]+$" is causing the empty strings. I introduced that filter to prevent numbers-only strings from being added to the index. Any hint on how I can get rid of numbers AND empty strings? Thanks! Marian --0016363100076db46704b3549282--