Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A08E10F12 for ; Fri, 30 Aug 2013 13:38:14 +0000 (UTC) Received: (qmail 28392 invoked by uid 500); 30 Aug 2013 13:38:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 28167 invoked by uid 500); 30 Aug 2013 13:38:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 28157 invoked by uid 99); 30 Aug 2013 13:38:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 13:38:05 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [116.50.78.85] (HELO mta3.rancoretech.com) (116.50.78.85) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Aug 2013 13:38:01 +0000 X-IronPort-AV: E=Sophos;i="4.89,990,1367951400"; d="scan'208";a="64031364" Received: from unknown (HELO outpostfix01.ril.com) ([10.66.8.167]) by gwsmtp011.ril.com with ESMTP; 30 Aug 2013 18:49:40 +0530 Received: from rdmail.rancoretech.com (unknown [10.22.140.196]) by outpostfix01.ril.com (Postfix) with SMTP id CB12812FBC9 for ; Fri, 30 Aug 2013 18:49:42 +0530 (IST) Received: from localhost (localhost.localdomain [127.0.0.1]) by rdmail.rancoretech.com (Postfix) with ESMTP id C08D74A88849 for ; Fri, 30 Aug 2013 18:49:42 +0530 (IST) X-Virus-Scanned: amavisd-new at rancoretech.com Received: from rdmail.rancoretech.com ([127.0.0.1]) by localhost (rdmail.rancoretech.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YCzGJYlBZJJx for ; Fri, 30 Aug 2013 18:49:42 +0530 (IST) Received: from [10.49.16.89] (unknown [10.49.16.89]) by rdmail.rancoretech.com (Postfix) with ESMTPSA id A13724A8876D for ; Fri, 30 Aug 2013 18:49:42 +0530 (IST) Message-ID: <52209BEA.8040908@rancoretech.com> Date: Fri, 30 Aug 2013 18:49:38 +0530 From: Ankit Murarka User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes. References: <521EE14B.5020105@rancoretech.com> <521F38F1.4020803@rancoretech.com> <521F4931.7070208@rancoretech.com> <5220692F.8060308@rancoretech.com> In-Reply-To: <5220692F.8060308@rancoretech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello, The following exception is being printed on the server console when trying to index. As usual, indexes are not getting created. java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.AttributeSource.(AttributeSource.java:148) at org.apache.lucene.util.AttributeSource.(AttributeSource.java:128) 18:42:21,764 INFO at org.apache.lucene.analysis.TokenStream.(TokenStream.java:91) 18:42:21,765 INFO at org.apache.lucene.document.Field$StringTokenStream.(Field.java:568) 18:42:21,765 INFO at org.apache.lucene.document.Field.tokenStream(Field.java:541) 18:42:21,765 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95) 18:42:21,766 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:197) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:153) 18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95) 18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 18:42:21,772 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) 18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) 18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) 18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122) 18:42:22,008 INFO Indexing to directory Any guidance will be highly appreciated...>!!!!... Server Opts are -server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m On 8/30/2013 3:13 PM, Ankit Murarka wrote: > Hello. > The server has much more memory. I have given minimum 8 GB to > Application Server.. > > The Java opts which are of interest is : -server -Xms8192m -Xmx16384m > -XX:MaxPermSize=8192m > > Even after giving this much memory to the server, how come i am > hitting OOM exceptions. No other activity is being performed on the > server apart from this. > > Checking from JConsole, the maximum Heap during indexing was close to > 1.2 GB whereas the memory allocated is as mentioned above,. > > I did mentioned 128MB also but this is when I start the server on a > normal windows machine. > > Isn't there any property/configuration in LUCENE which I should do in > order to index large files. Say about 30 MB.. I read something > MergeFactor and etc. but was not able to set any value for it. Don't > even know whether doing that will help the cause.. > > > On 8/29/2013 7:04 PM, Ian Lea wrote: >> Well, I use neither Eclipse nor your application server and can offer >> no advice on any differences in behaviour between the two. Maybe you >> should try Eclipse or app server forums. >> >> If you are going to index the complete contents of a file as one field >> you are likely to hit OOM exceptions. How big is the largest file you >> are ever going to index? >> >> The server may have 8GB but how much memory are you allowing the JVM? >> What are the command line flags? I think you mentioned 128Mb in an >> earlier email. That isn't much. >> >> >> -- >> Ian. >> >> >> >> On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka >> wrote: >>> Hello, >>> I get exception only when the code is fired from Eclipse. >>> When it is deployed on an application server, I get no exception at >>> all. >>> This forced me to invoke the same code from Eclipse and check what >>> is the >>> issue.,. >>> >>> I ran the code on server with 8 GB memory.. Even then no exception >>> occurred....!!.. Only write.lock is formed.. >>> >>> Removing contents field is not desirable as this is needed for >>> search to >>> work perfectly... >>> >>> On 8/29/2013 6:17 PM, Ian Lea wrote: >>>> So you do get an exception after all, OOM. >>>> >>>> Try it without this line: >>>> >>>> doc.add(new TextField("contents", new BufferedReader(new >>>> InputStreamReader(fis, "UTF-8")))); >>>> >>>> I think that will slurp the whole file in one go which will obviously >>>> need more memory on larger files than on smaller ones. >>>> >>>> Or just run the program with more memory, >>>> >>>> >>>> -- >>>> Ian. >>>> >>>> >>>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka >>>> wrote: >>>> >>>>> Yes I know that Lucene should not have any document size limits. >>>>> All I >>>>> get >>>>> is a lock file inside my index folder. Along with this there's no >>>>> other >>>>> file >>>>> inside the index folder. Then I get OOM exception. >>>>> Please provide some guidance... >>>>> >>>>> Here is the example: >>>>> >>>>> package com.issue; >>>>> >>>>> >>>>> import org.apache.lucene.analysis.Analyzer; >>>>> import org.apache.lucene.document.Document; >>>>> import org.apache.lucene.document.Field; >>>>> import org.apache.lucene.document.LongField; >>>>> import org.apache.lucene.document.StringField; >>>>> import org.apache.lucene.document.TextField; >>>>> import org.apache.lucene.index.IndexCommit; >>>>> import org.apache.lucene.index.IndexWriter; >>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode; >>>>> import org.apache.lucene.index.IndexWriterConfig; >>>>> import org.apache.lucene.index.LiveIndexWriterConfig; >>>>> import org.apache.lucene.index.LogByteSizeMergePolicy; >>>>> import org.apache.lucene.index.MergePolicy; >>>>> import org.apache.lucene.index.SerialMergeScheduler; >>>>> import org.apache.lucene.index.MergePolicy.OneMerge; >>>>> import org.apache.lucene.index.MergeScheduler; >>>>> import org.apache.lucene.index.Term; >>>>> import org.apache.lucene.store.Directory; >>>>> import org.apache.lucene.store.FSDirectory; >>>>> import org.apache.lucene.util.Version; >>>>> >>>>> >>>>> import java.io.BufferedReader; >>>>> import java.io.File; >>>>> import java.io.FileInputStream; >>>>> import java.io.FileNotFoundException; >>>>> import java.io.FileReader; >>>>> import java.io.IOException; >>>>> import java.io.InputStreamReader; >>>>> import java.io.LineNumberReader; >>>>> import java.util.Date; >>>>> >>>>> public class D { >>>>> >>>>> /** Index all text files under a directory. */ >>>>> >>>>> >>>>> static String[] filenames; >>>>> >>>>> public static void main(String[] args) { >>>>> >>>>> //String indexPath = args[0]; >>>>> >>>>> String indexPath="D:\\Issue";//Place where indexes will be >>>>> created >>>>> String docsPath="Issue"; //Place where the files are kept. >>>>> boolean create=true; >>>>> >>>>> String ch="OverAll"; >>>>> >>>>> >>>>> final File docDir = new File(docsPath); >>>>> if (!docDir.exists() || !docDir.canRead()) { >>>>> System.out.println("Document directory '" >>>>> +docDir.getAbsolutePath()+ >>>>> "' does not exist or is not readable, please check the path"); >>>>> System.exit(1); >>>>> } >>>>> >>>>> Date start = new Date(); >>>>> try { >>>>> Directory dir = FSDirectory.open(new File(indexPath)); >>>>> Analyzer analyzer=new >>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44); >>>>> IndexWriterConfig iwc = new >>>>> IndexWriterConfig(Version.LUCENE_44, >>>>> analyzer); >>>>> iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); >>>>> >>>>> IndexWriter writer = new IndexWriter(dir, iwc); >>>>> if(ch.equalsIgnoreCase("OverAll")){ >>>>> indexDocs(writer, docDir,true); >>>>> }else{ >>>>> filenames=args[2].split(","); >>>>> // indexDocs(writer, docDir); >>>>> >>>>> } >>>>> writer.commit(); >>>>> writer.close(); >>>>> >>>>> } catch (IOException e) { >>>>> System.out.println(" caught a " + e.getClass() + >>>>> "\n with message: " + e.getMessage()); >>>>> } >>>>> catch(Exception e) >>>>> { >>>>> >>>>> e.printStackTrace(); >>>>> } >>>>> } >>>>> >>>>> //Over All >>>>> static void indexDocs(IndexWriter writer, File file,boolean flag) >>>>> throws IOException { >>>>> >>>>> FileInputStream fis = null; >>>>> if (file.canRead()) { >>>>> >>>>> if (file.isDirectory()) { >>>>> String[] files = file.list(); >>>>> // an IO error could occur >>>>> if (files != null) { >>>>> for (int i = 0; i< files.length; i++) { >>>>> indexDocs(writer, new File(file, files[i]),flag); >>>>> } >>>>> } >>>>> } else { >>>>> try { >>>>> fis = new FileInputStream(file); >>>>> } catch (FileNotFoundException fnfe) { >>>>> >>>>> fnfe.printStackTrace(); >>>>> } >>>>> >>>>> try { >>>>> >>>>> Document doc = new Document(); >>>>> >>>>> Field pathField = new StringField("path", file.getPath(), >>>>> Field.Store.YES); >>>>> doc.add(pathField); >>>>> >>>>> doc.add(new LongField("modified", file.lastModified(), >>>>> Field.Store.NO)); >>>>> >>>>> doc.add(new >>>>> StringField("name",file.getName(),Field.Store.YES)); >>>>> >>>>> doc.add(new TextField("contents", new BufferedReader(new >>>>> InputStreamReader(fis, "UTF-8")))); >>>>> >>>>> LineNumberReader lnr=new LineNumberReader(new >>>>> FileReader(file)); >>>>> >>>>> >>>>> String line=null; >>>>> while( null != (line = lnr.readLine()) ){ >>>>> doc.add(new >>>>> StringField("SC",line.trim(),Field.Store.YES)); >>>>> // doc.add(new >>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED)); >>>>> } >>>>> >>>>> if (writer.getConfig().getOpenMode() == >>>>> OpenMode.CREATE_OR_APPEND) >>>>> { >>>>> >>>>> writer.addDocument(doc); >>>>> writer.commit(); >>>>> fis.close(); >>>>> } else { >>>>> try >>>>> { >>>>> writer.updateDocument(new Term("path", file.getPath()), >>>>> doc); >>>>> >>>>> fis.close(); >>>>> >>>>> }catch(Exception e) >>>>> { >>>>> writer.close(); >>>>> fis.close(); >>>>> >>>>> e.printStackTrace(); >>>>> >>>>> } >>>>> } >>>>> >>>>> }catch (Exception e) { >>>>> writer.close(); >>>>> fis.close(); >>>>> >>>>> e.printStackTrace(); >>>>> }finally { >>>>> // writer.close(); >>>>> >>>>> fis.close(); >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> >>>>> >>>>> >>>>> On 8/29/2013 4:20 PM, Michael McCandless wrote: >>>>> >>>>>> Lucene doesn't have document size limits. >>>>>> >>>>>> There are default limits for how many tokens the highlighters will >>>>>> process >>>>>> ... >>>>>> >>>>>> But, if you are passing each line as a separate document to Lucene, >>>>>> then Lucene only sees a bunch of tiny documents, right? >>>>>> >>>>>> Can you boil this down to a small test showing the problem? >>>>>> >>>>>> Mike McCandless >>>>>> >>>>>> http://blog.mikemccandless.com >>>>>> >>>>>> >>>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka >>>>>> wrote: >>>>>> >>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>> Faced with a typical issue. >>>>>>> I have many files which I am indexing. >>>>>>> >>>>>>> Problem Faced: >>>>>>> a. File having size less than 20 MB are successfully indexed and >>>>>>> merged. >>>>>>> >>>>>>> b. File having size>20MB are not getting INDEXED.. No Exception is >>>>>>> being >>>>>>> thrown. Only a lock file is being created in the index >>>>>>> directory. The >>>>>>> indexing process for a single file exceeding 20 MB size >>>>>>> continues for >>>>>>> more >>>>>>> than 8 minutes after which I have a code which merge the generated >>>>>>> index >>>>>>> to >>>>>>> existing index. >>>>>>> >>>>>>> Since no index is being generated now, I get an exception during >>>>>>> merging >>>>>>> process. >>>>>>> >>>>>>> Why Files having size greater than 20 MB are not being >>>>>>> indexed..??. I >>>>>>> am >>>>>>> indexing each line of the file. Why IndexWriter is not throwing any >>>>>>> error. >>>>>>> >>>>>>> Do I need to change any parameter in Lucene or tweak the Lucene >>>>>>> settings >>>>>>> ?? >>>>>>> Lucene version is 4.4.0 >>>>>>> >>>>>>> My current deployment for Lucene is on a server running with 128 >>>>>>> MB and >>>>>>> 512 >>>>>>> MB heap. >>>>>>> >>>>>>> -- >>>>>>> Regards >>>>>>> >>>>>>> Ankit Murarka >>>>>>> >>>>>>> "What lies behind us and what lies before us are tiny matters >>>>>>> compared >>>>>>> with >>>>>>> what lies within us" >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards >>>>> >>>>> Ankit Murarka >>>>> >>>>> "What lies behind us and what lies before us are tiny matters >>>>> compared >>>>> with >>>>> what lies within us" >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> >>> >>> >>> -- >>> Regards >>> >>> Ankit Murarka >>> >>> "What lies behind us and what lies before us are tiny matters >>> compared with >>> what lies within us" >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > -- Regards Ankit Murarka "What lies behind us and what lies before us are tiny matters compared with what lies within us" --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org