lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Murarka <ankit.mura...@rancoretech.com>
Subject Re: Files greater than 20 MB not getting Indexed. No files generated except write.lock even after 8-9 minutes.
Date Thu, 29 Aug 2013 13:14:25 GMT
Hello,
          I get exception only when the code is fired from Eclipse.
When it is deployed on an application server, I get no exception at all. 
This forced me to invoke the same code from Eclipse and check what is 
the issue.,.

I ran the code on server with 8 GB memory.. Even then no exception 
occurred....!!.. Only write.lock is formed..

Removing contents field is not desirable as this is needed for search to 
work perfectly...

On 8/29/2013 6:17 PM, Ian Lea wrote:
> So you do get an exception after all, OOM.
>
> Try it without this line:
>
> doc.add(new TextField("contents", new BufferedReader(new
> InputStreamReader(fis, "UTF-8"))));
>
> I think that will slurp the whole file in one go which will obviously
> need more memory on larger files than on smaller ones.
>
> Or just run the program with more memory,
>
>
> --
> Ian.
>
>
> On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka
> <ankit.murarka@rancoretech.com>  wrote:
>    
>> Yes I know that Lucene should not have any document size limits. All I get
>> is a lock file inside my index folder. Along with this there's no other file
>> inside the index folder. Then I get OOM exception.
>> Please provide some guidance...
>>
>> Here is the example:
>>
>> package com.issue;
>>
>>
>> import org.apache.lucene.analysis.Analyzer;
>> import org.apache.lucene.document.Document;
>> import org.apache.lucene.document.Field;
>> import org.apache.lucene.document.LongField;
>> import org.apache.lucene.document.StringField;
>> import org.apache.lucene.document.TextField;
>> import org.apache.lucene.index.IndexCommit;
>> import org.apache.lucene.index.IndexWriter;
>> import org.apache.lucene.index.IndexWriterConfig.OpenMode;
>> import org.apache.lucene.index.IndexWriterConfig;
>> import org.apache.lucene.index.LiveIndexWriterConfig;
>> import org.apache.lucene.index.LogByteSizeMergePolicy;
>> import org.apache.lucene.index.MergePolicy;
>> import org.apache.lucene.index.SerialMergeScheduler;
>> import org.apache.lucene.index.MergePolicy.OneMerge;
>> import org.apache.lucene.index.MergeScheduler;
>> import org.apache.lucene.index.Term;
>> import org.apache.lucene.store.Directory;
>> import org.apache.lucene.store.FSDirectory;
>> import org.apache.lucene.util.Version;
>>
>>
>> import java.io.BufferedReader;
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.io.FileNotFoundException;
>> import java.io.FileReader;
>> import java.io.IOException;
>> import java.io.InputStreamReader;
>> import java.io.LineNumberReader;
>> import java.util.Date;
>>
>> public class D {
>>
>>    /** Index all text files under a directory. */
>>
>>
>>      static String[] filenames;
>>
>>    public static void main(String[] args) {
>>
>>      //String indexPath = args[0];
>>
>>      String indexPath="D:\\Issue";//Place where indexes will be created
>>      String docsPath="Issue";    //Place where the files are kept.
>>      boolean create=true;
>>
>>      String ch="OverAll";
>>
>>
>>     final File docDir = new File(docsPath);
>>     if (!docDir.exists() || !docDir.canRead()) {
>>        System.out.println("Document directory '" +docDir.getAbsolutePath()+
>> "' does not exist or is not readable, please check the path");
>>        System.exit(1);
>>      }
>>
>>      Date start = new Date();
>>     try {
>>       Directory dir = FSDirectory.open(new File(indexPath));
>>       Analyzer analyzer=new
>> com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
>>       IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
>> analyzer);
>>        iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
>>
>>        IndexWriter writer = new IndexWriter(dir, iwc);
>>        if(ch.equalsIgnoreCase("OverAll")){
>>            indexDocs(writer, docDir,true);
>>        }else{
>>            filenames=args[2].split(",");
>>           // indexDocs(writer, docDir);
>>
>>     }
>>        writer.commit();
>>        writer.close();
>>
>>      } catch (IOException e) {
>>        System.out.println(" caught a " + e.getClass() +
>>         "\n with message: " + e.getMessage());
>>      }
>>      catch(Exception e)
>>      {
>>
>>          e.printStackTrace();
>>      }
>>   }
>>
>>    //Over All
>>    static void indexDocs(IndexWriter writer, File file,boolean flag)
>>    throws IOException {
>>
>>        FileInputStream fis = null;
>>   if (file.canRead()) {
>>
>>      if (file.isDirectory()) {
>>       String[] files = file.list();
>>        // an IO error could occur
>>        if (files != null) {
>>          for (int i = 0; i<  files.length; i++) {
>>            indexDocs(writer, new File(file, files[i]),flag);
>>          }
>>        }
>>     } else {
>>        try {
>>          fis = new FileInputStream(file);
>>       } catch (FileNotFoundException fnfe) {
>>
>>         fnfe.printStackTrace();
>>       }
>>
>>        try {
>>
>>            Document doc = new Document();
>>
>>            Field pathField = new StringField("path", file.getPath(),
>> Field.Store.YES);
>>            doc.add(pathField);
>>
>>            doc.add(new LongField("modified", file.lastModified(),
>> Field.Store.NO));
>>
>>            doc.add(new StringField("name",file.getName(),Field.Store.YES));
>>
>>           doc.add(new TextField("contents", new BufferedReader(new
>> InputStreamReader(fis, "UTF-8"))));
>>
>>            LineNumberReader lnr=new LineNumberReader(new FileReader(file));
>>
>>
>>           String line=null;
>>            while( null != (line = lnr.readLine()) ){
>>                doc.add(new StringField("SC",line.trim(),Field.Store.YES));
>>               // doc.add(new
>> Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
>>            }
>>
>>            if (writer.getConfig().getOpenMode() == OpenMode.CREATE_OR_APPEND)
>> {
>>
>>              writer.addDocument(doc);
>>              writer.commit();
>>              fis.close();
>>            } else {
>>                try
>>                {
>>              writer.updateDocument(new Term("path", file.getPath()), doc);
>>
>>              fis.close();
>>
>>                }catch(Exception e)
>>                {
>>                    writer.close();
>>                     fis.close();
>>
>>                    e.printStackTrace();
>>
>>                }
>>            }
>>
>>        }catch (Exception e) {
>>             writer.close();
>>              fis.close();
>>
>>           e.printStackTrace();
>>        }finally {
>>            // writer.close();
>>
>>          fis.close();
>>        }
>>      }
>>    }
>> }
>> }
>>
>>
>>
>> On 8/29/2013 4:20 PM, Michael McCandless wrote:
>>      
>>> Lucene doesn't have document size limits.
>>>
>>> There are default limits for how many tokens the highlighters will process
>>> ...
>>>
>>> But, if you are passing each line as a separate document to Lucene,
>>> then Lucene only sees a bunch of tiny documents, right?
>>>
>>> Can you boil this down to a small test showing the problem?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>>
>>> On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
>>> <ankit.murarka@rancoretech.com>   wrote:
>>>
>>>        
>>>> Hello all,
>>>>
>>>> Faced with a typical issue.
>>>> I have many files which I am indexing.
>>>>
>>>> Problem Faced:
>>>> a. File having size less than 20 MB are successfully indexed and merged.
>>>>
>>>> b. File having size>20MB are not getting INDEXED.. No Exception is being
>>>> thrown. Only a lock file is being created in the index directory. The
>>>> indexing process for a single file exceeding 20 MB size continues for
>>>> more
>>>> than 8 minutes after which I have a code which merge the generated index
>>>> to
>>>> existing index.
>>>>
>>>> Since no index is being generated now, I get an exception during merging
>>>> process.
>>>>
>>>> Why Files having size greater than 20 MB are not being indexed..??.  I am
>>>> indexing each line of the file. Why IndexWriter is not throwing any
>>>> error.
>>>>
>>>> Do I need to change any parameter in Lucene or tweak the Lucene settings
>>>> ??
>>>> Lucene version is 4.4.0
>>>>
>>>> My current deployment for Lucene is on a server running with 128 MB and
>>>> 512
>>>> MB heap.
>>>>
>>>> --
>>>> Regards
>>>>
>>>> Ankit Murarka
>>>>
>>>> "What lies behind us and what lies before us are tiny matters compared
>>>> with
>>>> what lies within us"
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>          
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>        
>>
>>
>> --
>> Regards
>>
>> Ankit Murarka
>>
>> "What lies behind us and what lies before us are tiny matters compared with
>> what lies within us"
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>      
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>    


-- 
Regards

Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what lies within
us"


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message