lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-469) DB Import RequestHandler
Date Wed, 06 Feb 2008 17:05:08 GMT

     [ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shalin Shekhar Mangar updated SOLR-469:
---------------------------------------

    Attachment: SOLR-469.patch

A patch out of our (Noble Paul's and Shalin Shekhar Mangar's) work on this issue. Please refer
to http://wiki.apache.org/solr/DataImportHandler for a user guide.

Our design philosophy for data imports is based on templatized SQL which gives the user of
this tool a lot of flexibility. It can generate schemas, do full-imports and delta-imports.
Please note that this is work in progress and there's a lot to be done for it to be committed.
We plan to write more documentation and tests as we go on.

Start by looking at changes to solrconfig.xml and then to DataImportHandler.java The central
class is DataImporter.java which uses DocBuilder to do the actual full-dump and delta-dump
operations.

We expose a powerful API for applications to do custom tasks. This API was needed because
even in our own tasks, there was frequent need to perform custom operations on rows/columns
before they could be indexed. Assuming that other users may face the same problems, we expose
Context.java, DataSource.java, EntityProcessor.java, Transformer.java as interfaces which
can be used to provide custom data sources or transformations on column values before indexing.
In our own project, we have used these interfaces to do tasks such as reading XML from a column
and extracting relevant items to be indexed.

Looking forward to your feedback and comments. Let us know what will it take to get this feature
into SOLR.

 - Noble Paul & Shalin Shekhar Mangar

> DB Import RequestHandler
> ------------------------
>
>                 Key: SOLR-469
>                 URL: https://issues.apache.org/jira/browse/SOLR-469
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources into the
Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103).
> The way it works is as follows.
>     * Provide a configuration file (xml) to the Handler which takes in the necessary
SQL queries and mappings to a solr schema
>           - It also takes in a properties file for the data source configuraution
>     * Given the configuration it can also generate the solr schema.xml
>     * It is registered as a RequestHandler which can take two commands do-full-import,
do-delta-import
>           -  do-full-import - dumps all the data from the Database into the index (based
on the SQL query in configuration)
>           - do-delta-import - dumps all the data that has changed since last import.
(We assume a modified-timestamp column in tables)
>     * It provides a admin page
>           - where we can schedule it to be run automatically at regular intervals
>           - It shows the status of the Handler (idle, full-import, delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message