lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DataImportHandler" by OkkeKlein
Date Fri, 11 Nov 2011 12:26:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "DataImportHandler" page has been changed by OkkeKlein:
http://wiki.apache.org/solr/DataImportHandler?action=diff&rev1=296&rev2=297

  = Scheduling =
  {i}
  
-  * Data``Import``Scheduler
+  * DataImportScheduler
   * Version 1.2
   * Last revision: 20.09.2010.
   * Author: Marko Bonaci
@@ -1095, +1095 @@

  
  <<BR>> <!> TODO:
  
-  * enable user to create multiple scheduled tasks (List<Data``Import``Scheduler>)
+  * enable user to create multiple scheduled tasks (List<DataImportScheduler>)
   * add ''cancel'' functionality (to be able to completely disable ''DIHScheduler'' background
thread, without stopping the app/server). Currently, sync can be disabled by setting  ''syncEnabled''
param to anything other than "1" in ''dataimport.properties'', but the background thread still
remains active and reloads the properties file on every run (so that sync can be hot-redeployed)
   * try to use Solr's classes wherever possible
   * add javadoc style comments
@@ -1112, +1112 @@

    * parametrized the schedule interval (in minutes)
  
   * v1.1:
-   * now using ''Solr``Resource``Loader'' to get ''solr.home'' (as opposed to ''System properties''
in v1.0)
+   * now using ''SolrResourceLoader'' to get ''solr.home'' (as opposed to ''System properties''
in v1.0)
    * forces reloading of the properties file if the response code is not 200
    * logging done using ''slf4j'' (used ''System.out'' in v1.0)
  
@@ -1505, +1505 @@

    . {{{
     <dataSource type="FileDataSource" encoding="UTF-8"/>
  }}}
-  * If you dont get the expected data imported from a db, there are a few things to check:

+  * If you dont get the expected data imported from a db, there are a few things to check:
  
  1. Chaining the transformers is a bit tricky. Some of the transformers get the data from
specified "sourceColName" (attribute) but they put the transformed data back into the other
specified "column" (attribute) so next transformer in chain will actually act on the same
untransformed data! To avoid this, it's better to fix the column names in your sql using "AS"
and use no "sourceColName":
+ 
-   . {{{
+  . {{{
  <entity name="transaction"
   transformer="ClobTransformer, RegexTransformer"
   query="SELECT CO_TRANSACTION_ID as TID_COMMON, CO_FROM_SERVICE_DT as FROM_SERVICE_DT, CO_TO_SERVICE_DT
as TO_SERVICE_DT, CO_PATIENT_LAST_NM as PATIENT_LAST_NM, CO_TOTAL_CLAIM_CHARGE_AMT as TOTAL_CLAIM_CHARGE_AMT
FROM TABLE(pkg_solr_import.cb_get_transactions('${document.DOCUMENT_ID}'))"
- 			>
+                         >
  <field column="TID_COMMON" splitBy="#" clob="true"/>
  <field column="FROM_SERVICE_DT" splitBy="#" clob="true"/>
  <field column="TO_SERVICE_DT" splitBy="#" clob="true"/>
@@ -1522, +1523 @@

  </entity>
  }}}
  
- One common issue due to the chaining of the transformers and use of the "sourceColName"
is getting stuff like oracle.sql.CLOB@aed3a5 in your imported data. 
+ One common issue due to the chaining of the transformers and use of the "sourceColName"
is getting stuff like oracle.sql.CLOB@aed3a5 in your imported data.
  
  2. Pay attention to case sensitivity in the column names! I'd recommend using only upper
case. If specifying field column="FROM_SERVICE_Dt" but the query has the column named FROM_SERVICE_DT
then you wont see any error but you wont get any data either on that field!
- 
  
  ----
  CategorySolrRequestHandler

Mime
View raw message