Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A73DE10B8E for ; Sun, 2 Mar 2014 14:07:04 +0000 (UTC) Received: (qmail 31893 invoked by uid 500); 2 Mar 2014 14:07:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30819 invoked by uid 500); 2 Mar 2014 14:06:51 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Delivered-To: moderator for java-user@lucene.apache.org Received: (qmail 68474 invoked by uid 99); 2 Mar 2014 10:15:44 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rahevar.mrugendra@gmail.com designates 209.85.213.173 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=0n8BFs79QnqtF2dxkXiH1FR8rXmCcFmvZ6CUuHQd+oY=; b=SFcBxkfZesR86dH7gJktnVHjs3JIaffkQMgtPk5cMFpokhD2xdyHYX8ooe+c2aNiqB iOSNm6gJEh6kVlEHZ/NFk17ybaTKOmQP6ek7LBUiUIf5z7ZGQ8dCHJZxQOjYdKXEgxWS pGUecLY561oeylGchETLfUVyt2pM90g7Nl1hrrmk3zOVITuGsSsSrE2XRwKXZcLkWoAb Rz1sCW0QTlzTsHaNO5m8HdwbBVlLGRpbYe5cMpUsyOhyHmAlZ/SMK/f3AtybuSrkUc/4 QuiQp8U2yawkoqthXsjG5LMHm5c7QQtX8ALr5jyuHXfBtJ3E1ilncoIsEp3WPkkOU8X9 Ns1g== MIME-Version: 1.0 X-Received: by 10.50.137.100 with SMTP id qh4mr16018189igb.4.1393755318595; Sun, 02 Mar 2014 02:15:18 -0800 (PST) Date: Sun, 2 Mar 2014 15:45:18 +0530 Message-ID: Subject: query regarding Lucene Indexing and searching From: Mrugendra To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a11c31f74ab387e04f39cf248 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c31f74ab387e04f39cf248 Content-Type: text/plain; charset=ISO-8859-1 Sir i am PG student, my research topic is to optimize the indexing file [reduce index file size, RAM usage, CPU utilization, and create index with payload to improve searching speed]. Currently working scope is Desktop search engine 1.i am using lucene for indexing the pdf files[indexing file name and content]. after applying standard analyzer lucene index file size is 11 MB for 1.77GB and windows 8 windows.edb file size 42 MB for 1.77GB[Tested for windows desktop environment]. So the space complexity is done. How to do time complexity? 2. how to apply lemmatization with standard analyzer to reduce index file size and ADD PAYLOAD during indexing. 3. from where i can find the test benchmark. -- Regards Rahevar Mrugendrasinh --001a11c31f74ab387e04f39cf248--