lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DataImportHandler" by NoblePaul
Date Wed, 26 Mar 2008 11:00:35 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
   * '''`connectionTimeout`''' (optional):The default value is 5000ms 
   * '''`readTimeout`''' (optional): the default value is 10000ms 
  == Configuration in data-config.xml ==
+ 
  The entity for an xml/http data source can have the following attributes over and above
the default attributes
+  * '''`processor`''' (required) : The value must be `"XPathEntityProcessor"`
   * '''`url`''' (required) : The url used to invoke the REST API. (Can be templatized)
   * '''`forEach`'''(required) : The xpath expression which demarcates a record. If there
are mutiple types of record separate them with '' |  ''
+  The fields can have the following attributes (over and above the default attributes):
+  * '''`xpath`''' (required) : The xpath expression of the field to be mapped as a column
in the record . It can be omitted if the column does not come from an xml attribute. That
means it can be a synthetic field created by a transformer
+  * '''`commonField`''' : can be (true| false) . If true, this field once encountered in
a record will be copied to other records before creating a Solr document
  
+ If an API supports chunking (when the dataset is too large) multiple calls need to be made
to complete the process. 
+ X!PathEntityprocessor supports this with a transformer. If transformer returns a row which
contains a field '''`$hasMore`''' with a the value `"true"` the Processor makes another request
with the same url template (The actual value is recomputed before invoking ). A transformer
can pass a totally new url too for the next call by returning a row which contains a field
'''`$nextUrl`''' whose value must be the complete url for the next call.
  
+ The X!PathEntityProcessor implements a streaming parser which supports a subset of xpath
syntax. Complete xpath syntax is not supported but most of the common use cases are covered
-  
-  
- 
  = Extending the tool with APIs =
  The examples we explored are admittedly, trivial . It is not possible to have all user needs
met by an xml configuration alone. So we expose a few interfaces which can be implemented
by the user to enhance the functionality.
  

Mime
View raw message