lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Created: (SOLR-798) FileListEntityProcessor can't handle directories containing lots of files
Date Thu, 02 Oct 2008 20:05:44 GMT
FileListEntityProcessor can't handle directories containing lots of files
-------------------------------------------------------------------------

                 Key: SOLR-798
                 URL: https://issues.apache.org/jira/browse/SOLR-798
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
            Reporter: Grant Ingersoll
            Priority: Minor


The FileListEntityProcessor currently tries to process all documents in a single directory
at once, and stores the results into a hashmap.  On directories containing a large number
of documents, this quickly causes OutOfMemory errors.

Unfortunately, the typical fix to this is to hack FileFilter to do the work for you and always
return false from the accept method.  It may be possible to hook up some type of Producer/Consumer
multithreaded FileFilter approach whereby the FileFilter blocks until the nextRow() mechanism
requests another row, thereby avoiding the need to cache everything in the map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message