lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ExtractingRequestHandler" by YonikSeeley
Date Tue, 14 Jul 2009 21:53:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by YonikSeeley:
http://wiki.apache.org/solr/ExtractingRequestHandler

The comment on the change is:
explain params in example

------------------------------------------------------------------------------
  Now, you should be able to execute a query and find that document (open the following link
in your browser):
  http://localhost:8983/solr/select?q=tutorial
  
- You may notice that although you can search on any of the text in the sample document, you
may not be able to see that text when the document is retrieved.  This is simply because the
"content" field generated by Tika is mapped to the Solr field called "text" (which is indexed
but not stored) via the default map rule in {{{solrconfig.xml}}} that can be changed or overridden.
 For example, to store and see all metadata and content, execute the following:
+ You may notice that although you can search on any of the text in the sample document, you
may not be able to see that text when the document is retrieved.  This is simply because the
"content" field generated by Tika is mapped to the Solr field called "text", which is indexed
but not stored. This is done via the default map rule in the {{/udate/extract}}} handler in
{{{solrconfig.xml}}} and can be easily changed or overridden. For example, to store and see
all metadata and content, execute the following:
  {{{
  curl 'http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&map.content=attr_content&commit=true'
-F "myfile=@tutorial.html"
  }}}
+  * The {{{uprefix=attr_}}} param causes all generated fields that aren't defined in the
schema to be prefixed with attr_ (which is a dynamic field that is stored).
+  * The {{{map.content=attr_content}}} param overrides the default {{{map.content=text}}}
causing the content to be added to the attr_content field instead.
+ 
+  
  And then query via http://localhost:8983/solr/select?q=attr_content:tutorial
  
  // TODO: move this somewhere else to a more in-depth discussion of different ways to send
the data to Solr (prob with remoteStreaming discussion)

Mime
View raw message