Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33298EF67 for ; Tue, 25 Jun 2013 21:01:30 +0000 (UTC) Received: (qmail 73497 invoked by uid 500); 25 Jun 2013 21:01:26 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 73445 invoked by uid 500); 25 Jun 2013 21:01:26 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 73437 invoked by uid 99); 25 Jun 2013 21:01:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jun 2013 21:01:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of idokissos@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jun 2013 21:01:21 +0000 Received: by mail-wg0-f43.google.com with SMTP id z11so9772109wgg.10 for ; Tue, 25 Jun 2013 14:01:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; bh=b1C4T7XgjWfiO0xTl2F+PvBTzAho3kl2OQRjEhDILm0=; b=u5ysLtPEMt1bT8C+zxrUdp6mMksoyaVriZtHYU67eBCzLIetEkts+WZyvTgIvYLBq+ cawPezUoSja7TTXM/4j4ORaGeN6ALNGARypJY24Vxd220SHPAKd6KgG/avSnQIBo3gBg ZtOvNkr4Abe5At4N3Uw7DIFtVaP3OouB0oTcXh5Zu6KJ0Vjgaq5c7fpx5oEGHk7tW3xx hjlun0wd7Y750SGfPqy7nynlJM3Y7ys+LmD1ZQS/2ZUdypSuT2vVlArLg8gmjOptoVV/ tKypYJt/z4jvhSRZwCWUC6jMIHeDYOxPydU2oCyexKflQQnHB//aUzSECGSM7qUuDcJQ y2hA== X-Received: by 10.180.78.39 with SMTP id y7mr616265wiw.4.1372194060499; Tue, 25 Jun 2013 14:01:00 -0700 (PDT) MIME-Version: 1.0 Sender: idokissos@gmail.com Received: by 10.194.2.196 with HTTP; Tue, 25 Jun 2013 14:00:30 -0700 (PDT) From: Manuel Le Normand Date: Wed, 26 Jun 2013 00:00:30 +0300 X-Google-Sender-Auth: LkAiOj-D0SHbM_2L0hRMMPEeuTk Message-ID: Subject: Common practice for free text field To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=f46d0438934b8a040b04e000d39a X-Virus-Checked: Checked by ClamAV on apache.org --f46d0438934b8a040b04e000d39a Content-Type: text/plain; charset=ISO-8859-1 My schema contains about a hundred of fields of various types (int, strings, plain text, emails). I was concerned what is the common practice for searching free text over the index. Assuming there are not boosts related to field matching, these are the options I see: 1. Index and query a "all_fields" copyField source=* 1. advantages - only one query flow against a single index. 2. disadvantage - the tokenizing is not necessarily adapted to this kind of field, this requires more storage and memory 2. Field aliasing ( f.myalias.qf=realfield) 1. advantages - opposite from the above 2. disadvantages - a single query term would query 100 different fields. Multi term query might be a serious performance issue. Any common practices? --f46d0438934b8a040b04e000d39a--