lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "TikaEntityProcessor" by NoblePaul
Date Fri, 11 Dec 2009 07:50:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "TikaEntityProcessor" page has been changed by NoblePaul.
http://wiki.apache.org/solr/TikaEntityProcessor?action=diff&rev1=3&rev2=4

--------------------------------------------------

  ==== fields ====
  Each field may have an optional attribute meta="true". Which means this field is to be obtained
from the !MetaData of the document. The column value is used as the key on metadata. Checkout
the list of available keys from here [[http://svn.apache.org/viewvc/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/DublinCore.java?revision=801678&view=markup
| DublinCore]] , [[http://svn.apache.org/viewvc/lucene/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/MSOffice.java?revision=801678&view=markup
|MSOffice]]
  
- === DataSource ==
+ === DataSource ===
- use any !DataSource of type !DataSource<InputStream>. The inbuilt ones are
+ use any !DataSource of type !DataSource<!InputStream>. The inbuilt ones are
+ 
   * !BinURLDataSource : use for both http as well as for files
   * !BinContentStreamDataSource : Use for uploading content
   * !BinFileDataSource :  use for reading from file system
  
  
  == Advanced Parsing ==
- The !TikaEntityProcessor can be nested with X!PathEntityProcessor for selctively indexing
documents
+ The !TikaEntityProcessor can be nested with X!PathEntityProcessor for selectively indexing
documents
  
  example:
  {{{
@@ -48, +49 @@

        <field column="title" meta="true" name="docTitle"/>
        <!--'text' is an implicit field emited by TikaEntityProcessor . Map it appropriately-->
        <field column="text"/>
-       <entity type="XPathEntityProcessor" forEach="/hml">
+       <entity type="XPathEntityProcessor" forEach="/html" dataField="text">
           <field xpath="//div"  column="foo"/>
           <field xpath="//h1"  column="h1" />
        </entity>

Mime
View raw message