Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69964106F5 for ; Thu, 29 Aug 2013 13:15:42 +0000 (UTC) Received: (qmail 95788 invoked by uid 500); 29 Aug 2013 13:15:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95576 invoked by uid 500); 29 Aug 2013 13:15:39 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95458 invoked by uid 99); 29 Aug 2013 13:15:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2013 13:15:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [116.50.78.85] (HELO mta3.rancoretech.com) (116.50.78.85) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Aug 2013 13:15:33 +0000 X-IronPort-AV: E=Sophos;i="4.89,983,1367951400"; d="scan'208";a="63534974" Received: from unknown (HELO outpostfix02.ril.com) ([10.66.8.169]) by gwsmtp011.ril.com with ESMTP; 29 Aug 2013 18:44:18 +0530 Received: from rdmail.rancoretech.com (unknown [10.22.140.196]) by outpostfix02.ril.com (Postfix) with SMTP id E94C9A1583A for ; Thu, 29 Aug 2013 18:44:20 +0530 (IST) Received: from localhost (localhost.localdomain [127.0.0.1]) by rdmail.rancoretech.com (Postfix) with ESMTP id DD7444A88731 for ; Thu, 29 Aug 2013 18:44:20 +0530 (IST) X-Virus-Scanned: amavisd-new at rancoretech.com Received: from rdmail.rancoretech.com ([127.0.0.1]) by localhost (rdmail.rancoretech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bPzeA1S6pfwe for ; Thu, 29 Aug 2013 18:44:20 +0530 (IST) Received: from [10.49.16.89] (unknown [10.49.16.89]) by rdmail.rancoretech.com (Postfix) with ESMTPSA id C14EF4A8872F for ; Thu, 29 Aug 2013 18:44:20 +0530 (IST) Message-ID: <521F4931.7070208@rancoretech.com> Date: Thu, 29 Aug 2013 18:44:25 +0530 From: Ankit Murarka User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes. References: <521EE14B.5020105@rancoretech.com> <521F38F1.4020803@rancoretech.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello, I get exception only when the code is fired from Eclipse. When it is deployed on an application server, I get no exception at all. This forced me to invoke the same code from Eclipse and check what is the issue.,. I ran the code on server with 8 GB memory.. Even then no exception occurred....!!.. Only write.lock is formed.. Removing contents field is not desirable as this is needed for search to work perfectly... On 8/29/2013 6:17 PM, Ian Lea wrote: > So you do get an exception after all, OOM. > > Try it without this line: > > doc.add(new TextField("contents", new BufferedReader(new > InputStreamReader(fis, "UTF-8")))); > > I think that will slurp the whole file in one go which will obviously > need more memory on larger files than on smaller ones. > > Or just run the program with more memory, > > > -- > Ian. > > > On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka > wrote: > >> Yes I know that Lucene should not have any document size limits. All I get >> is a lock file inside my index folder. Along with this there's no other file >> inside the index folder. Then I get OOM exception. >> Please provide some guidance... >> >> Here is the example: >> >> package com.issue; >> >> >> import org.apache.lucene.analysis.Analyzer; >> import org.apache.lucene.document.Document; >> import org.apache.lucene.document.Field; >> import org.apache.lucene.document.LongField; >> import org.apache.lucene.document.StringField; >> import org.apache.lucene.document.TextField; >> import org.apache.lucene.index.IndexCommit; >> import org.apache.lucene.index.IndexWriter; >> import org.apache.lucene.index.IndexWriterConfig.OpenMode; >> import org.apache.lucene.index.IndexWriterConfig; >> import org.apache.lucene.index.LiveIndexWriterConfig; >> import org.apache.lucene.index.LogByteSizeMergePolicy; >> import org.apache.lucene.index.MergePolicy; >> import org.apache.lucene.index.SerialMergeScheduler; >> import org.apache.lucene.index.MergePolicy.OneMerge; >> import org.apache.lucene.index.MergeScheduler; >> import org.apache.lucene.index.Term; >> import org.apache.lucene.store.Directory; >> import org.apache.lucene.store.FSDirectory; >> import org.apache.lucene.util.Version; >> >> >> import java.io.BufferedReader; >> import java.io.File; >> import java.io.FileInputStream; >> import java.io.FileNotFoundException; >> import java.io.FileReader; >> import java.io.IOException; >> import java.io.InputStreamReader; >> import java.io.LineNumberReader; >> import java.util.Date; >> >> public class D { >> >> /** Index all text files under a directory. */ >> >> >> static String[] filenames; >> >> public static void main(String[] args) { >> >> //String indexPath = args[0]; >> >> String indexPath="D:\\Issue";//Place where indexes will be created >> String docsPath="Issue"; //Place where the files are kept. >> boolean create=true; >> >> String ch="OverAll"; >> >> >> final File docDir = new File(docsPath); >> if (!docDir.exists() || !docDir.canRead()) { >> System.out.println("Document directory '" +docDir.getAbsolutePath()+ >> "' does not exist or is not readable, please check the path"); >> System.exit(1); >> } >> >> Date start = new Date(); >> try { >> Directory dir = FSDirectory.open(new File(indexPath)); >> Analyzer analyzer=new >> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44); >> IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44, >> analyzer); >> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); >> >> IndexWriter writer = new IndexWriter(dir, iwc); >> if(ch.equalsIgnoreCase("OverAll")){ >> indexDocs(writer, docDir,true); >> }else{ >> filenames=args[2].split(","); >> // indexDocs(writer, docDir); >> >> } >> writer.commit(); >> writer.close(); >> >> } catch (IOException e) { >> System.out.println(" caught a " + e.getClass() + >> "\n with message: " + e.getMessage()); >> } >> catch(Exception e) >> { >> >> e.printStackTrace(); >> } >> } >> >> //Over All >> static void indexDocs(IndexWriter writer, File file,boolean flag) >> throws IOException { >> >> FileInputStream fis = null; >> if (file.canRead()) { >> >> if (file.isDirectory()) { >> String[] files = file.list(); >> // an IO error could occur >> if (files != null) { >> for (int i = 0; i< files.length; i++) { >> indexDocs(writer, new File(file, files[i]),flag); >> } >> } >> } else { >> try { >> fis = new FileInputStream(file); >> } catch (FileNotFoundException fnfe) { >> >> fnfe.printStackTrace(); >> } >> >> try { >> >> Document doc = new Document(); >> >> Field pathField = new StringField("path", file.getPath(), >> Field.Store.YES); >> doc.add(pathField); >> >> doc.add(new LongField("modified", file.lastModified(), >> Field.Store.NO)); >> >> doc.add(new StringField("name",file.getName(),Field.Store.YES)); >> >> doc.add(new TextField("contents", new BufferedReader(new >> InputStreamReader(fis, "UTF-8")))); >> >> LineNumberReader lnr=new LineNumberReader(new FileReader(file)); >> >> >> String line=null; >> while( null != (line = lnr.readLine()) ){ >> doc.add(new StringField("SC",line.trim(),Field.Store.YES)); >> // doc.add(new >> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED)); >> } >> >> if (writer.getConfig().getOpenMode() == OpenMode.CREATE_OR_APPEND) >> { >> >> writer.addDocument(doc); >> writer.commit(); >> fis.close(); >> } else { >> try >> { >> writer.updateDocument(new Term("path", file.getPath()), doc); >> >> fis.close(); >> >> }catch(Exception e) >> { >> writer.close(); >> fis.close(); >> >> e.printStackTrace(); >> >> } >> } >> >> }catch (Exception e) { >> writer.close(); >> fis.close(); >> >> e.printStackTrace(); >> }finally { >> // writer.close(); >> >> fis.close(); >> } >> } >> } >> } >> } >> >> >> >> On 8/29/2013 4:20 PM, Michael McCandless wrote: >> >>> Lucene doesn't have document size limits. >>> >>> There are default limits for how many tokens the highlighters will process >>> ... >>> >>> But, if you are passing each line as a separate document to Lucene, >>> then Lucene only sees a bunch of tiny documents, right? >>> >>> Can you boil this down to a small test showing the problem? >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> >>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka >>> wrote: >>> >>> >>>> Hello all, >>>> >>>> Faced with a typical issue. >>>> I have many files which I am indexing. >>>> >>>> Problem Faced: >>>> a. File having size less than 20 MB are successfully indexed and merged. >>>> >>>> b. File having size>20MB are not getting INDEXED.. No Exception is being >>>> thrown. Only a lock file is being created in the index directory. The >>>> indexing process for a single file exceeding 20 MB size continues for >>>> more >>>> than 8 minutes after which I have a code which merge the generated index >>>> to >>>> existing index. >>>> >>>> Since no index is being generated now, I get an exception during merging >>>> process. >>>> >>>> Why Files having size greater than 20 MB are not being indexed..??. I am >>>> indexing each line of the file. Why IndexWriter is not throwing any >>>> error. >>>> >>>> Do I need to change any parameter in Lucene or tweak the Lucene settings >>>> ?? >>>> Lucene version is 4.4.0 >>>> >>>> My current deployment for Lucene is on a server running with 128 MB and >>>> 512 >>>> MB heap. >>>> >>>> -- >>>> Regards >>>> >>>> Ankit Murarka >>>> >>>> "What lies behind us and what lies before us are tiny matters compared >>>> with >>>> what lies within us" >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >>> >> >> >> -- >> Regards >> >> Ankit Murarka >> >> "What lies behind us and what lies before us are tiny matters compared with >> what lies within us" >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > -- Regards Ankit Murarka "What lies behind us and what lies before us are tiny matters compared with what lies within us" --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org