lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Murarka <ankit.mura...@rancoretech.com>
Subject Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
Date Fri, 30 Aug 2013 13:19:38 GMT
Hello,

The following exception is being printed on the server console when 
trying to index.  As usual, indexes are not getting created.


java.lang.OutOfMemoryError: Java heap space
     at 
org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148)
     at 
org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128)
18:42:21,764 INFO     at 
org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91)
18:42:21,765 INFO      at 
org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568)
18:42:21,765 INFO      at 
org.apache.lucene.document.Field.tokenStream(Field.java:541)
18:42:21,765 INFO      at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
18:42:21,766 INFO      at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
18:42:21,766 INFO      at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
18:42:21,766 INFO      at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
18:42:21,767 INFO      at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
18:42:21,767 INFO      at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188)
18:42:21,767 INFO      at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169)
18:42:21,768 INFO      at 
com.rancore.MainClass1.indexDocs(MainClass1.java:197)
18:42:21,768 INFO      at 
com.rancore.MainClass1.indexDocs(MainClass1.java:153)
18:42:21,768 INFO      at com.rancore.MainClass1.main(MainClass1.java:95)
18:42:21,771 INFO  java.lang.IllegalStateException: this writer hit an 
OutOfMemoryError; cannot commit
18:42:21,772 INFO      at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726)
18:42:21,911 INFO      at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897)
18:42:21,911 INFO    at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
18:42:21,912 INFO     at com.rancore.MainClass1.main(MainClass1.java:122)
18:42:22,008 INFO  Indexing to directory


Any guidance will be highly appreciated...>!!!!...  Server Opts are 
-server -Xms8192m -Xmx16384m -XX:MaxPermSize=512m

On 8/30/2013 3:13 PM, Ankit Murarka wrote:
> Hello.
>         The server has much more memory. I have given minimum 8 GB to 
> Application Server..
>
> The Java opts which are of interest is :  -server -Xms8192m -Xmx16384m 
> -XX:MaxPermSize=8192m
>
> Even after giving this much memory to the server, how come i am 
> hitting OOM exceptions. No other activity is being performed on the 
> server apart from this.
>
> Checking from JConsole, the maximum Heap during indexing was close to 
> 1.2 GB whereas the memory allocated is as mentioned above,.
>
> I did mentioned 128MB also but this is when I start the server on a 
> normal windows machine.
>
> Isn't there any property/configuration in LUCENE which I should do in 
> order to index large files. Say about 30 MB.. I read something 
> MergeFactor and etc. but was not able to set any value for it. Don't 
> even know whether doing that will help the cause..
>
>
> On 8/29/2013 7:04 PM, Ian Lea wrote:
>> Well, I use neither Eclipse nor your application server and can offer
>> no advice on any differences in behaviour between the two.  Maybe you
>> should try Eclipse or app server forums.
>>
>> If you are going to index the complete contents of a file as one field
>> you are likely to hit OOM exceptions.  How big is the largest file you
>> are ever going to index?
>>
>> The server may have 8GB but how much memory are you allowing the JVM?
>> What are the command line flags?  I think you mentioned 128Mb in an
>> earlier email.  That isn't much.
>>
>>
>> -- 
>> Ian.
>>
>>
>>
>> On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka
>> <ankit.murarka@rancoretech.com>  wrote:
>>> Hello,
>>>           I get exception only when the code is fired from Eclipse.
>>> When it is deployed on an application server, I get no exception at 
>>> all.
>>> This forced me to invoke the same code from Eclipse and check what 
>>> is the
>>> issue.,.
>>>
>>> I ran the code on server with 8 GB memory.. Even then no exception
>>> occurred....!!.. Only write.lock is formed..
>>>
>>> Removing contents field is not desirable as this is needed for 
>>> search to
>>> work perfectly...
>>>
>>> On 8/29/2013 6:17 PM, Ian Lea wrote:
>>>> So you do get an exception after all, OOM.
>>>>
>>>> Try it without this line:
>>>>
>>>> doc.add(new TextField("contents", new BufferedReader(new
>>>> InputStreamReader(fis, "UTF-8"))));
>>>>
>>>> I think that will slurp the whole file in one go which will obviously
>>>> need more memory on larger files than on smaller ones.
>>>>
>>>> Or just run the program with more memory,
>>>>
>>>>
>>>> -- 
>>>> Ian.
>>>>
>>>>
>>>> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
>>>> <ankit.murarka@rancoretech.com>   wrote:
>>>>
>>>>> Yes I know that Lucene should not have any document size limits. 
>>>>> All I
>>>>> get
>>>>> is a lock file inside my index folder. Along with this there's no 
>>>>> other
>>>>> file
>>>>> inside the index folder. Then I get OOM exception.
>>>>> Please provide some guidance...
>>>>>
>>>>> Here is the example:
>>>>>
>>>>> package com.issue;
>>>>>
>>>>>
>>>>> import org.apache.lucene.analysis.Analyzer;
>>>>> import org.apache.lucene.document.Document;
>>>>> import org.apache.lucene.document.Field;
>>>>> import org.apache.lucene.document.LongField;
>>>>> import org.apache.lucene.document.StringField;
>>>>> import org.apache.lucene.document.TextField;
>>>>> import org.apache.lucene.index.IndexCommit;
>>>>> import org.apache.lucene.index.IndexWriter;
>>>>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>>>>> import org.apache.lucene.index.IndexWriterConfig;
>>>>> import org.apache.lucene.index.LiveIndexWriterConfig;
>>>>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>>>>> import org.apache.lucene.index.MergePolicy;
>>>>> import org.apache.lucene.index.SerialMergeScheduler;
>>>>> import org.apache.lucene.index.MergePolicy.OneMerge;
>>>>> import org.apache.lucene.index.MergeScheduler;
>>>>> import org.apache.lucene.index.Term;
>>>>> import org.apache.lucene.store.Directory;
>>>>> import org.apache.lucene.store.FSDirectory;
>>>>> import org.apache.lucene.util.Version;
>>>>>
>>>>>
>>>>> import java.io.BufferedReader;
>>>>> import java.io.File;
>>>>> import java.io.FileInputStream;
>>>>> import java.io.FileNotFoundException;
>>>>> import java.io.FileReader;
>>>>> import java.io.IOException;
>>>>> import java.io.InputStreamReader;
>>>>> import java.io.LineNumberReader;
>>>>> import java.util.Date;
>>>>>
>>>>> public class D {
>>>>>
>>>>>     /** Index all text files under a directory. */
>>>>>
>>>>>
>>>>>       static String[] filenames;
>>>>>
>>>>>     public static void main(String[] args) {
>>>>>
>>>>>       //String indexPath = args[0];
>>>>>
>>>>>       String indexPath="D:\\Issue";//Place where indexes will be 
>>>>> created
>>>>>       String docsPath="Issue";    //Place where the files are kept.
>>>>>       boolean create=true;
>>>>>
>>>>>       String ch="OverAll";
>>>>>
>>>>>
>>>>>      final File docDir = new File(docsPath);
>>>>>      if (!docDir.exists() || !docDir.canRead()) {
>>>>>         System.out.println("Document directory '"
>>>>> +docDir.getAbsolutePath()+
>>>>> "' does not exist or is not readable, please check the path");
>>>>>         System.exit(1);
>>>>>       }
>>>>>
>>>>>       Date start = new Date();
>>>>>      try {
>>>>>        Directory dir = FSDirectory.open(new File(indexPath));
>>>>>        Analyzer analyzer=new
>>>>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>>>>        IndexWriterConfig iwc = new 
>>>>> IndexWriterConfig(Version.LUCENE_44,
>>>>> analyzer);
>>>>>         iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>>>>
>>>>>         IndexWriter writer = new IndexWriter(dir, iwc);
>>>>>         if(ch.equalsIgnoreCase("OverAll")){
>>>>>             indexDocs(writer, docDir,true);
>>>>>         }else{
>>>>>             filenames=args[2].split(",");
>>>>>            // indexDocs(writer, docDir);
>>>>>
>>>>>      }
>>>>>         writer.commit();
>>>>>         writer.close();
>>>>>
>>>>>       } catch (IOException e) {
>>>>>         System.out.println(" caught a " + e.getClass() +
>>>>>          "\n with message: " + e.getMessage());
>>>>>       }
>>>>>       catch(Exception e)
>>>>>       {
>>>>>
>>>>>           e.printStackTrace();
>>>>>       }
>>>>>    }
>>>>>
>>>>>     //Over All
>>>>>     static void indexDocs(IndexWriter writer, File file,boolean flag)
>>>>>     throws IOException {
>>>>>
>>>>>         FileInputStream fis = null;
>>>>>    if (file.canRead()) {
>>>>>
>>>>>       if (file.isDirectory()) {
>>>>>        String[] files = file.list();
>>>>>         // an IO error could occur
>>>>>         if (files != null) {
>>>>>           for (int i = 0; i<   files.length; i++) {
>>>>>             indexDocs(writer, new File(file, files[i]),flag);
>>>>>           }
>>>>>         }
>>>>>      } else {
>>>>>         try {
>>>>>           fis = new FileInputStream(file);
>>>>>        } catch (FileNotFoundException fnfe) {
>>>>>
>>>>>          fnfe.printStackTrace();
>>>>>        }
>>>>>
>>>>>         try {
>>>>>
>>>>>             Document doc = new Document();
>>>>>
>>>>>             Field pathField = new StringField("path", file.getPath(),
>>>>> Field.Store.YES);
>>>>>             doc.add(pathField);
>>>>>
>>>>>             doc.add(new LongField("modified", file.lastModified(),
>>>>> Field.Store.NO));
>>>>>
>>>>>             doc.add(new
>>>>> StringField("name",file.getName(),Field.Store.YES));
>>>>>
>>>>>            doc.add(new TextField("contents", new BufferedReader(new
>>>>> InputStreamReader(fis, "UTF-8"))));
>>>>>
>>>>>             LineNumberReader lnr=new LineNumberReader(new
>>>>> FileReader(file));
>>>>>
>>>>>
>>>>>            String line=null;
>>>>>             while( null != (line = lnr.readLine()) ){
>>>>>                 doc.add(new
>>>>> StringField("SC",line.trim(),Field.Store.YES));
>>>>>                // doc.add(new
>>>>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>>>>             }
>>>>>
>>>>>             if (writer.getConfig().getOpenMode() ==
>>>>> OpenMode.CREATE_OR_APPEND)
>>>>> {
>>>>>
>>>>>               writer.addDocument(doc);
>>>>>               writer.commit();
>>>>>               fis.close();
>>>>>             } else {
>>>>>                 try
>>>>>                 {
>>>>>               writer.updateDocument(new Term("path", file.getPath()),
>>>>> doc);
>>>>>
>>>>>               fis.close();
>>>>>
>>>>>                 }catch(Exception e)
>>>>>                 {
>>>>>                     writer.close();
>>>>>                      fis.close();
>>>>>
>>>>>                     e.printStackTrace();
>>>>>
>>>>>                 }
>>>>>             }
>>>>>
>>>>>         }catch (Exception e) {
>>>>>              writer.close();
>>>>>               fis.close();
>>>>>
>>>>>            e.printStackTrace();
>>>>>         }finally {
>>>>>             // writer.close();
>>>>>
>>>>>           fis.close();
>>>>>         }
>>>>>       }
>>>>>     }
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>>>>
>>>>>> Lucene doesn't have document size limits.
>>>>>>
>>>>>> There are default limits for how many tokens the highlighters will
>>>>>> process
>>>>>> ...
>>>>>>
>>>>>> But, if you are passing each line as a separate document to Lucene,
>>>>>> then Lucene only sees a bunch of tiny documents, right?
>>>>>>
>>>>>> Can you boil this down to a small test showing the problem?
>>>>>>
>>>>>> Mike McCandless
>>>>>>
>>>>>> http://blog.mikemccandless.com
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>>>>> <ankit.murarka@rancoretech.com>    wrote:
>>>>>>
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> Faced with a typical issue.
>>>>>>> I have many files which I am indexing.
>>>>>>>
>>>>>>> Problem Faced:
>>>>>>> a. File having size less than 20 MB are successfully indexed
and
>>>>>>> merged.
>>>>>>>
>>>>>>> b. File having size>20MB are not getting INDEXED.. No Exception
is
>>>>>>> being
>>>>>>> thrown. Only a lock file is being created in the index 
>>>>>>> directory. The
>>>>>>> indexing process for a single file exceeding 20 MB size 
>>>>>>> continues for
>>>>>>> more
>>>>>>> than 8 minutes after which I have a code which merge the generated
>>>>>>> index
>>>>>>> to
>>>>>>> existing index.
>>>>>>>
>>>>>>> Since no index is being generated now, I get an exception during
>>>>>>> merging
>>>>>>> process.
>>>>>>>
>>>>>>> Why Files having size greater than 20 MB are not being 
>>>>>>> indexed..??.  I
>>>>>>> am
>>>>>>> indexing each line of the file. Why IndexWriter is not throwing
any
>>>>>>> error.
>>>>>>>
>>>>>>> Do I need to change any parameter in Lucene or tweak the Lucene
>>>>>>> settings
>>>>>>> ??
>>>>>>> Lucene version is 4.4.0
>>>>>>>
>>>>>>> My current deployment for Lucene is on a server running with
128 
>>>>>>> MB and
>>>>>>> 512
>>>>>>> MB heap.
>>>>>>>
>>>>>>> -- 
>>>>>>> Regards
>>>>>>>
>>>>>>> Ankit Murarka
>>>>>>>
>>>>>>> "What lies behind us and what lies before us are tiny matters

>>>>>>> compared
>>>>>>> with
>>>>>>> what lies within us"
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------

>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Regards
>>>>>
>>>>> Ankit Murarka
>>>>>
>>>>> "What lies behind us and what lies before us are tiny matters 
>>>>> compared
>>>>> with
>>>>> what lies within us"
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>> -- 
>>> Regards
>>>
>>> Ankit Murarka
>>>
>>> "What lies behind us and what lies before us are tiny matters 
>>> compared with
>>> what lies within us"
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>


-- 
Regards

Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what lies within
us"


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message