lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "DataImportHandler" by NoblePaul
Date Mon, 31 Mar 2008 16:12:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  
  It moves ahead and encounters `/RDF/item` and processes the rows one by one . It gets the
values for all the fields except for the 3 fields in the header. But as they were marked as
common fields, the processor puts those fields into the record just before creating the document.
  
+ What about this ''transformer=!DateFormatTransformer'' attribute in the entity? . See !DateFormat
Section for details
- What about this ''transformer=!DateFormatTransformer'' attribute in the entity? This is
an inbuilt utility transformer helps the user parse his date strings in custom format to 'Date'
objects . Note the field `<field column="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss"
/>` . The transformer only applies to a field which has the attribute 'dateTimeFormat'
and it uses the syntax of java's [http://java.sun.com/j2se/1.4.2/docs/api/java/text/SimpleDateFormat.html
SimpleDateFormat].
- 
  
  You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds
, other SOLR servers or even well formed xhtml documents . Our XPath support has its limitations
but we have tried to make sure that common use-cases are covered and since it's based on a
streaming parser, it is extremely fast and consumes constant amount of memory even for large
XMLs. It does not support namespaces , but it can handle xmls with namespaces . When you provide
the xpath, just drop the namespace and give the rest (eg if the tag is `'<dc:subject>'`
the mapping should just contain `'subject'`).Easy, isn't it? And you didn't need to write
one line of code! Enjoy :)
  = Extending the tool with APIs =
@@ -450, +449 @@

  
  {{{
  <dataConfig>
+ 	<script><![CDATA[
- 	<script>
- 		<![CDATA[
  		function f1(row)	{
  		    row.put('message', 'Hello World!');
  		    return row;
  		}
- 		]]>
- 	</script>
+ 	]]></script>
  	<document>
  		<entity name="e" pk="id" transformer="script:f1" query="select * from X">
                  ....

Mime
View raw message