Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C698178CE for ; Tue, 31 Mar 2015 15:06:39 +0000 (UTC) Received: (qmail 22655 invoked by uid 500); 31 Mar 2015 15:06:32 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 22581 invoked by uid 500); 31 Mar 2015 15:06:32 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 22569 invoked by uid 99); 31 Mar 2015 15:06:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 15:06:32 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of simon.martinelli@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2015 15:06:06 +0000 Received: by obcjt1 with SMTP id jt1so31007105obc.2 for ; Tue, 31 Mar 2015 08:03:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=TEOXl5uNhoXYTWD6+uyRGsWa2Vayy2NG06ygnfmwCbs=; b=rl0EjUoiITa0xL14fJlb23T2lD7UeSlyT3O1VQy0Nx4dZTQ0Dk7k55Q0GOO+I8IK7t 3uiMXXCZaJnIjKryFqFwH8ytpgAJeHw6DNplCEb+lTm8PpNzHBChTkzNdOvvnnG1UQai Z3Pj5Ca2iZoEedQimIS0Rw9vb3MToTYMCnuTS0Sa7KahTYnYYCi/pvVXZfSINdef+YaO cNtmEyOZx9MlIt4mn3FgyISkNorqubZNBBz95RlcSEX2EdTYPCrFwIx3W/FziTqKGEfD 8g1YSnAQKrkpaCB09IZBGFAlgzD64AVL8vJhsrL6A5iBbqiTtXcAaHbjnjWMpnkJrGPq jb/A== X-Received: by 10.60.155.225 with SMTP id vz1mr33329292oeb.52.1427814213200; Tue, 31 Mar 2015 08:03:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.192.6 with HTTP; Tue, 31 Mar 2015 08:03:11 -0700 (PDT) From: Simon Martinelli Date: Tue, 31 Mar 2015 17:03:11 +0200 Message-ID: Subject: solr.DictionaryCompoundWordTokenFilterFactory extracts words in string To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e010d84eefbb0e4051296e6cf X-Virus-Checked: Checked by ClamAV on apache.org --089e010d84eefbb0e4051296e6cf Content-Type: text/plain; charset=UTF-8 Hi, I configured solr.DictionaryCompoundWordTokenFilterFactory using a dictionary with the following content: - lindor - schlitten - dorsch - filet I want to index the compound words - dorschfilet - lindorschlitten dorschfilet is processed as expected dorsch filet but lindorschlitten is compound of lindor and schlitten but i get lindor dorsch schlitten so the filter is extracting dorsch but the word before (lin) and after (litten) are not valid word parts. Is there any better compound word filter for German? Thanks, Simon --089e010d84eefbb0e4051296e6cf--