lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DataImportHandler" by NoblePaul
Date Wed, 26 Mar 2008 09:40:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  In order to use this handler, the following steps are required.
   * Define a data-config.xml and specify the location this file in solrconfig.xml under DataImportHandler
section
   * Give connection information 
-    * driver (required): The jdbc driver classname
+    * '''`driver`''' (required): The jdbc driver classname
-    * url (required) : The jdbc connection url
+    * '''`url`''' (required) : The jdbc connection url
-    * user : User name
+    * '''`user`''' : User name
-    * password : The password
+    * '''`password`''' : The password
-    * batchSize : The batchsize used in jdbc connection
+    * '''`batchSize`''' : The batchsize used in jdbc connection
   * Open the DataImportHandler page to verify if everything is in order [http://localhost:8983/solr/dataimport]
   * Use full-import command to do a full import from the database and add to SOLR index
   * Use delta-import command to do a delta import (get new inserts/updates) and add to SOLR
index
@@ -57, +57 @@

  == Configuration in data-config.xml ==
  A SOLR document can be considered as a de-normalized schema having fields whose values come
from multiple tables.
  
- The data-config.xml starts by defining a "document" element A `document` has a 1:1 relationship
with a core and the ''name'' attribute of the tag refers to the ''core name'' (Ignore this
for single core deployments).  A document  contains one or more root entity. A root entity
can contain multiple sub-entities which in turn can  contain other entities. An entity is
a table/view in a relational database  . Each entity can contain multiple fields. Each field
corresponds to a column in the resultset returned by the ''query'' in the entity .For each
field, mention the column name in the resultset. If the column name is different from the
solr field name, then another attribute ''name'' should be given. Rest of the required attributes
such as ''type'' will be read directly from the SOLR schema.xml.
+ The data-config.xml starts by defining a "document" element A `document` represents one
kind of document .  A document  contains one or more root entity. A root entity can contain
multiple sub-entities which in turn can  contain other entities. An entity is a table/view
in a relational database  . Each entity can contain multiple fields. Each field corresponds
to a column in the resultset returned by the ''query'' in the entity .For each field, mention
the column name in the resultset. If the column name is different from the solr field name,
then another attribute ''name'' should be given. Rest of the required attributes such as ''type''
will be inferred directly from the SOLR schema.xml. (Can be overridden)
  
- In order to get data from the database, our design philosophy revolves around 'templatized
sql' entered by the user for each entity. This gives the user the entire power of SQL if he
needs it. The root entity is the central table whose primary key can be used to join this
table with other child entities.
+ In order to get data from the database, our design philosophy revolves around 'templatized
sql' entered by the user for each entity. This gives the user the entire power of SQL if he
needs it. The root entity is the central table whose columns can be used to join this table
with other child entities.
+ 
+ === Schema for the xml config ===
+   The dataconfig does not have a rigid schema. The attributes in the entity/field are aribitrary
and depends on the `processor` and `transformer`. For !JdbcdataSource the entity attributes
are 
+ 
+  * '''`name`''' (required) : A unique name used to identify an entity
+  * '''`query`''' (required) : The sql string using which to query the db
+  * '''`dataSource`''' : The name of a datasource as put in the solrconfig.xml .(USed if
there are multiple datasources) 
+  * '''`pk`''' : The primary key for the entity
+  * '''`deltaQuery`''' : Only used in delta-import
+  * '''`parentDeltaQuery`''' : Only used in delta-import
+  * '''`deletedPkQuery`''' : Only used in delta-import
+  * '''`rootEntity`''' : By default the entities falling under the document are root entities.
If it is set to false , the entity directly falling under that entity will be treated as the
root entity (so on and so forth). For every row returned by the roor entity a document is
created in Solr
+ 
  
  == Commands ==
   

Mime
View raw message