lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll
Date Tue, 15 Sep 2009 12:39:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/ExtractingRequestHandler

------------------------------------------------------------------------------
  = Sending documents to Solr =
  
  // TODO: describe the different ways to send the documents to solr (POST body, form encoded,
remoteStreaming)
-  * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text
 --data-binary @tutorial.html  -H 'Content-type:text/html'  
+  * curl http://localhost:8983/solr/update/extract?\&defaultField=text  --data-binary
@tutorial.html  -H 'Content-type:text/html'  
         <!> NOTE, this literally streams the file, which does not, then, provide info
to Solr about the name of the file.
   * SolrJ:  Use the ContentStreamUpdateRequest (see SolrExampleTests.java for full example):{{{
      ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
@@ -202, +202 @@

  
  = What's in a Name =
  
- Grant was writing the javadocs for the code and needed an entry for the <title> tag
and wrote out "Solr Content Extraction Library", since the contrib directory is named "extraction".
 This then lead to an "acronym":  Solr CEL which then gets mashed to: Solr Cell!  Hence, the
project name is "Solr Cell"!  It's also appropriate because a Solar Cell's job is to convert
the raw energy of the Sun to electricity, and this contrib's module is responsible for converting
the "raw" content of a document to something usable by Solr. http://en.wikipedia.org/wiki/Solar_cell
+ Grant was writing the javadocs for the code and needed an entry for the <title> tag
and wrote out "Solr Content Extraction Library", since the contrib directory is named "extraction".
 This then lead to an "acronym":  Solr CEL which then gets mashed to: Solr Cell.  Hence, the
project name is "Solr Cell".  It's also appropriate because a Solar Cell's job is to convert
the raw energy of the Sun to electricity, and this contrib's module is responsible for converting
the "raw" content of a document to something usable by Solr. http://en.wikipedia.org/wiki/Solar_cell
  

Mime
View raw message