lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DataImportHandler" by NoblePaul
Date Wed, 26 Mar 2008 14:09:14 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  
  The configuration has a 'flexible' schema. It lets the user provide arbitrary attributes
in an 'entity' tag  and 'field' tags. The tool reads the data and hands it over to the implementation
class as it is. If the 'Transformer' needs extra information to be provided on a per entity/field
basis it can do so. The values can be obtained from the Context. 
  
- There is an inbuilt transformer called '!RegExpTransfromer' provided with the tool itself.
It helps in extracting values from fields (from db) using Regular Expressions.
+ There is an inbuilt transformer called '!RegexTransfromer' provided with the tool itself.
It helps in extracting values from fields (from db) using Regular Expressions. The actual
class name is `org.apache.solr.handler.dataimport.RegexTransformer` . But as it belongs to
the default package , package-name can be omitted
  
  example:
  {{{
- <entity name="foo" transformer="org.apache.solr.handler.dataimport.RegExpTransformer"
 
+ <entity name="foo" transformer="RegexTransformer"  
  query="select full_name , emailids from foo"/>
  ... />
     <field column="full_name"/>
-    <field column="firstName" regExp="Mr(\w*)\b.*" sourceColName="full_name"/>
+    <field column="firstName" regex="Mr(\w*)\b.*" sourceColName="full_name"/>
-    <field column="lastName" regExp="Mr.*?\b(\w*)" sourceColName="full_name"/>
+    <field column="lastName" regex="Mr.*?\b(\w*)" sourceColName="full_name"/>
     <field column="mailId" splitBy="," sourceColName="emailids"/>
  </entity>
  }}}
+ 
+ ''''Attributes required by `RegexTransformer`''''
+  * '''`regex`''' : The regular expression that is used to match . This or `splitBy` must
be present for each field . If not, that field is not touched by the transformer . If `replaceWith`
is absent, each ''group'' is taken as a value and a list of values is returned
+  * '''`sourceColName`''' : The column on which the regex is to be applied. If there is only
one column this can be omitted
+  * '''`splitBy`''' : If the `regex` is used to split a String to obtain multipple values
use this
+  * '''`replaceWith`''' : Used alongwith `regex` . It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>,
<replaceWith>)`
- Here the attributes 'regExp' and 'sourceColName' are custom attributes used by the transformer.
It reads the field 'full_name' from the resultset and transform it to two target fields 'firstName'
and 'lastName' . So even though the query returned only one column 'full_name' in the resultset
the solr document gets two extra fields 'firstName' and 'lastName' wich are 'derived' fields.
+ Here the attributes 'regex' and 'sourceColName' are custom attributes used by the transformer.
It reads the field 'full_name' from the resultset and transform it to two target fields 'firstName'
and 'lastName' . So even though the query returned only one column 'full_name' in the resultset
the solr document gets two extra fields 'firstName' and 'lastName' wich are 'derived' fields.
  
  The 'emailids' field in the table can be a comma separated value. So it ends up giving out
one or more than one email ids and we expect the 'mailId' to be a multivalued field in Solr
  

Mime
View raw message