lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <>
Subject [jira] [Resolved] (SOLR-798) FileListEntityProcessor can't handle directories containing lots of files
Date Sat, 16 Mar 2013 18:56:12 GMT


Erick Erickson resolved SOLR-798.

    Resolution: Won't Fix

SPRING_CLEANING_2013 we can reopen if necessary. 
> FileListEntityProcessor can't handle directories containing lots of files
> -------------------------------------------------------------------------
>                 Key: SOLR-798
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Grant Ingersoll
>            Priority: Minor
> The FileListEntityProcessor currently tries to process all documents in a single directory
at once, and stores the results into a hashmap.  On directories containing a large number
of documents, this quickly causes OutOfMemory errors.
> Unfortunately, the typical fix to this is to hack FileFilter to do the work for you and
always return false from the accept method.  It may be possible to hook up some type of Producer/Consumer
multithreaded FileFilter approach whereby the FileFilter blocks until the nextRow() mechanism
requests another row, thereby avoiding the need to cache everything in the map.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message