lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DataImportHandler" by OkkeKlein
Date Sun, 18 Dec 2011 14:56:58 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "DataImportHandler" page has been changed by OkkeKlein:
http://wiki.apache.org/solr/DataImportHandler?action=diff&rev1=303&rev2=304

   * '''`url`''' (required) : The jdbc connection url
   * '''`user`''' : User name
   * '''`password`''' : The password
-  * '''`batchSize`''' : The batchsize used in jdbc connection
+  * '''`batchSize`''' : The batchsize used in jdbc connection. Use a value of '-1' in case
of   `setFetchSize() ` exception.
   * '''`convertType`''' :(true/false)Default is 'false' Automatically reads the data in the
target Solr data-type
   * '''`autoCommit`''' : If set to 'false' it sets  `setAutoCommit(false)` <!> [[Solr1.4]]
   * '''`readOnly`''' : If this is set to 'true' , it sets `setReadOnly(true)`, `setAutoCommit(true)`,
`setTransactionIsolation(TRANSACTION_READ_UNCOMMITTED)`,`setHoldability(CLOSE_CURSORS_AT_COMMIT)`
on the connection <!> [[Solr1.4]]
@@ -137, +137 @@

  <<Anchor(commands)>> The handler exposes all its API as http requests . The
following are the possible operations
  
   * '''full-import''' : Full Import operation can be started by hitting the URL `http://<host>:<port>/solr/dataimport?command=full-import`
+ 
    * This operation will be started in a new thread and the ''status'' attribute in the response
should be shown ''busy'' now.
    * The operation may take some time depending on size of dataset.
    * When full-import command is executed, it stores the start time of the operation in a
file located at ''conf/dataimport.properties''
@@ -148, +149 @@

     * '''commit''' : (default 'true'). Tells whether to commit after the operation.
     * '''optimize''' : (default 'true'). Tells whether to optimize after the operation.
     * '''debug''' : (default 'false'). Runs in debug mode. It is used by the interactive
development mode ([[#interactive|see here]]).
+ 
      * Please note that in debug mode, documents are never committed automatically. If you
want to run debug mode and commit the results too, add 'commit=true' as a request parameter.
   * '''delta-import''' : For incremental imports and change detection run the command `http://<host>:<port>/solr/dataimport?command=delta-import`
. It supports the same clean, commit, optimize and debug parameters as full-import command.
   * '''status''' : To know the status of the current command, hit the URL `http://<host>:<port>/solr/dataimport`
. It gives an elaborate statistics on no. of docs created, deleted, queries run, rows fetched,
status etc.
@@ -328, +330 @@

   . {{{
   deltaQuery="SELECT MAX(did) FROM ${dataimporter.request.dataView}"
  }}}
+ 
    . Changed to:
+ 
   {{{
   deltaQuery="SELECT MAX(did) AS did FROM ${dataimporter.request.dataView}"
  }}}
@@ -883, +887 @@

  See https://issues.apache.org/jira/browse/SOLR-2549 for a patch that extends LineEntityProcessor
to support fixed-width and delimited files without needing to use a Transformer.
  
  ----
- 
  === SolrEntityProcessor ===
  <<Anchor(SolrEntityProcessor)>> <!> [[Solr3.6]]
  
+ This !EntityProcessor imports data from different Solr instances and cores. The data is
retrieved based on a specified (filter) query. This !EntityProcessor is useful in cases you
want to copy your Solr index and slightly want to modify the data in the target index. In
some cases Solr might be the only place were all data is available. The !SolrEntityProcessor
can only copy fields that are stored in the source index. The !SolrEntityProcessor supports
the following attributes:
+ 
- This !EntityProcessor imports data from different Solr instances and cores. The data is
retrieved based on a specified (filter) query.
- This !EntityProcessor is useful in cases you want to copy your Solr index and slightly want
to modify the data in the target index. In some
- cases Solr might be the only place were all data is available. The !SolrEntityProcessor
can only copy fields that are stored in the source index.
- The !SolrEntityProcessor supports the following attributes:
   * '''`url`''' : (required) The url of the source Solr instance / core
   * '''`query`''' : (required) The main query to execute on the source index.
-  * '''`fq`''' : Any filter query to execute in the source index. (Comma seperated) 
+  * '''`fq`''' : Any filter query to execute in the source index. (Comma seperated)
   * '''`rows`''' : The number of rows to return for each iteration. Defaults to 50.
   * '''`fields`''' : What fields to fetch from the source index. (Comma seperated)
   * '''`format`''' : The format (javabin|xml) to use as reponse format. Use xml if the Solr
versions don't match.
   * '''`timeout`''' : The query timeout in seconds. This can be used as a fail-safe to prevent
the indexing session from freezing up. By default the timeout is 5 minutes.
  
  Example:
+ 
  {{{
  <dataConfig>
    <document>
@@ -907, +909 @@

    </document>
  </dataConfig>
  }}}
- 
  == DataSource ==
  <<Anchor(datasource)>> A class can extend `org.apache.solr.handler.dataimport.DataSource`
. [[http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/DataSource.java?view=markup|See
source]]
  
@@ -1024, +1025 @@

   * On doing a `command=full-import` The root-entity (A) is executed first
   * Each row that emitted by the 'query' in entity 'A' is fed into its sub entities B, C
   * The queries in B and C use a column in 'A' to construct their queries using placeholders
like `${A.a}`
+ 
    * B has a url  (B is an xml/http datasource)
    * C has a query
   * C has two transformers ('f' and 'g' )
@@ -1527, +1529 @@

  
  = Troubleshooting =
   * If you are having trouble indexing international characters, try setting the '''encoding'''
attribute to "UTF-8" on the dataSource element (example below). This should ensure that international
character data (stored in UTF8) ingested by the given source will be preserved.
+ 
    . {{{
     <dataSource type="FileDataSource" encoding="UTF-8"/>
  }}}

Mime
View raw message