lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Nithian (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-2096) DIH should be able read data directly from HDFS for indexing
Date Thu, 02 Sep 2010 23:16:33 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amit Nithian updated SOLR-2096:
-------------------------------

    Priority: Minor  (was: Major)

> DIH should be able read data directly from HDFS for indexing
> ------------------------------------------------------------
>
>                 Key: SOLR-2096
>                 URL: https://issues.apache.org/jira/browse/SOLR-2096
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4.1
>            Reporter: Amit Nithian
>            Priority: Minor
>             Fix For: 1.4.2
>
>         Attachments: hdfs_reader.tar
>
>
> DIH doesn't support reading from the hdfs:// protocol which makes it hard to index data
generated by a M/R job. This tarball contains a subclass of the URLDataSource along with an
HDFSReader that allows for this. The data is assumed to be in text format and able to be processed
by the LineEntityProcessor.
> Here is an example DIH-Config snippet:
>   <dataSource name="queryData" type="org.apache.solr.handler.dataimport.hdfs.HDFSDataSource"

>   baseUrl="hdfs://<YOURSERVER>:9000/" encoding="UTF-8" 
>   connectionTimeout="5000" readTimeout="10000"/>
> 	<document name="autoSuggester">
> 		<entity name="jc" processor="LineEntityProcessor"
> 			url="<YOUR FOLDER>/part*" dataSource="queryData">
> <!-- Field mappings here if necessary -->
> 		</entity>
> 	</document>
> </dataConfig>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message