Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1169 invoked from network); 21 Jan 2009 13:44:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Jan 2009 13:44:16 -0000 Received: (qmail 88383 invoked by uid 500); 21 Jan 2009 13:44:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88364 invoked by uid 500); 21 Jan 2009 13:44:09 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88353 invoked by uid 99); 21 Jan 2009 13:44:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2009 05:44:09 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anshulnirvana@gmail.com designates 209.85.218.15 as permitted sender) Received: from [209.85.218.15] (HELO mail-bw0-f15.google.com) (209.85.218.15) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2009 13:44:03 +0000 Received: by bwz8 with SMTP id 8so3434441bwz.5 for ; Wed, 21 Jan 2009 05:43:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=/tYqkzoAOG5Q9oaNJChbmqcm4kwt7zrNGUSUte0PiJE=; b=N61zdySIQP4e1Pq+JYHq4gco3MpEposuINloXLAFyqdDgKnEYCnX2Fi6cPj7auPZgL TJ9KB/NdhR9vMc6Qu2hxg/ChU1ATrOZtGYkUFNLP4+MhmQYShpf4OS/gvGBTFoI1AyBz j7XyzEhY/hHgV6pV65qjZTYVlcCb/N1I6Hidk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=jhq8MTuxyHZzicp8hDH9Fb75j9rYokH+aGvdn3ZI61NNZtcYDT1YP+4TJXMEglNa/u YiDsEG9puczX+midPDNgtDYxBdzWPuYzrLNgId25kQMtP3cYE1qz5MOPBn47BJVSOa7+ YLxQLcKlnV/9SeShLFr/1e4H8I81sQkb8x+4I= MIME-Version: 1.0 Sender: anshulnirvana@gmail.com Received: by 10.223.109.20 with SMTP id h20mr105103fap.41.1232545179633; Wed, 21 Jan 2009 05:39:39 -0800 (PST) Date: Wed, 21 Jan 2009 14:39:39 +0100 X-Google-Sender-Auth: 330994b8fb18bd90 Message-ID: Subject: Lucene Performance issue From: Anshul jain To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001636c5ac8c9a0a070460fe4924 X-Virus-Checked: Checked by ClamAV on apache.org --001636c5ac8c9a0a070460fe4924 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi, I've indexed around half a million XML documents. Here is the document sample: cogito:Name Alexander the Great cogito:domain ancient history cogito:first_sentence Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander III, was an ancient Greek king (basileus) of Macedon (336-323 BC). Average size of documents is around 4KB. There are a few performance issues I need help with. When I index documents, in a structured manner, using field information like: name: alexander the great domain: ancient history first_sentence: Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander III, was an ancient Greek king (basileus) of Macedon (336-323 BC). bagOfWords: alexander the great ancient history Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander III, was an ancient Greek king (basileus) of Macedon (336-323 BC). bagOfWords is the field with all the text appended to it. I get the index size of 4.5 GB, but if I just append the text and store in one field like: value: alexander the great ancient history Alexander the Great (Greek: or Megas Alexandros; July 20 356 BC June 10 323 BC), also known as Alexander III, was an ancient Greek king (basileus) of Macedon (336-323 BC). the index size is only 700 MB.. why is this happening? Also the query execution time of MultiFieldQueries is very slow, it is 20 times slower than single field query. Is it normal, what could be the reason for that? Thanks, Cheers, Anshul -- Anshul Jain --001636c5ac8c9a0a070460fe4924--