lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Updated) (JIRA)" <>
Subject [jira] [Updated] (SOLR-2549) DIH LineEntityProcessor support for delimited & fixed-width files
Date Tue, 13 Dec 2011 19:54:31 GMT


James Dyer updated SOLR-2549:

    Attachment: SOLR-2549.patch

A long time ago someone on the users' list asked for better support for delimited files. 
This version supports most of the same features as the CSVRequestHandler, using the same csv
parser and most of the same parameter names.  

The reason for using DIH instead for CSVRequestHandler would be cases where the flat file
needs to be joined to other entities, if the data needs to be cached, and/or if transformers
need to be applied.

This patch also retains the same support for fixed-width files.

The unit tests have been enhanced to test these new possibilities.
> DIH LineEntityProcessor support for delimited & fixed-width files
> -----------------------------------------------------------------
>                 Key: SOLR-2549
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 4.0
>            Reporter: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2549.patch, SOLR-2549.patch, SOLR-2549.patch
> Provides support for Fixed Width and Delimited Files without needing to write a Transformer.

> The following xml properties are supported with this version of LineEntityProcessor:
> For fixed width files:
>  - colDef[#]
> For Delimited files:
>  - fieldDelimiterRegex
>  - firstLineHasFieldnames
>  - delimitedFieldNames
>  - delimitedFieldTypes
> These properties are described in the api documentation.  See patch.
> When combined with the cache improvements from SOLR-2382 this allows you to join a flat
file entity with other entities (sql, etc).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message