lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cassandra Targett (Confluence)" <conflue...@apache.org>
Subject [CONF] Apache Solr Reference Guide > Simple Post Tool
Date Wed, 18 Sep 2013 17:10:00 GMT
Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: Simple Post Tool (https://cwiki.apache.org/confluence/display/solr/Simple+Post+Tool)

Change Comment:
---------------------------------------------------------------------
Rename page; fix examples; add more info on system properties

Edited by Cassandra Targett:
---------------------------------------------------------------------
Solr includes a simple command line tool for POSTing raw XML to a Solr port. XML data can
be read from files specified as command line arguments, as raw commandline argument strings,
or via STDIN.

The tool is called {{post.jar}} and is found in the 'exampledocs' directory: {{$SOLR/example/exampledocs/post.jar}}
includes a cross-platform Java tool for POST-ing XML documents. 

To run it, open a window and enter:

{code:language=none|borderStyle=solid|borderColor=#666666}
java -jar post.jar <list of files with  messages>
{code}

By default, this will contact the server at {{localhost:8983}}. The '-help' (or simply '-h'
option will output information on its usage (i.e., {{java -jar post.jar -help}}. 

h2. Using the Simple Post Tool

Options controlled by System Properties include the Solr URL to post to, the {{Content-Type}}
of the data, whether a commit or optimize should be executed, and whether the response should
be written to {{STDOUT}}.  You may override any other request parameter through the {{\-Dparams}}
property

This table lists the supported system properties and their defaults:

|| Parameter || Values || Default || Description ||
| \-Ddata | args, stdin, files, web | files | Use *args* to pass arguments along the command
line (such as a command to delete a document). Use *files* to pass a filename or regex pattern
indicating paths and filenames. Use *stdin* to use standard input. Use *web* for a very simple
web crawler (arguments for this would be the URL to crawl).   |
| \-Dtype | <content-type> | application/xml | Defines the content-type, if {{-Dauto}}
is not used. |
| \-Durl | <solr-update-url> | http://localhost:8983/solr/update | The Solr URL to send
the updates to. |
| \-Dauto | yes, no | no | If yes, the tool will guess the file type from file name suffix,
and set type and url accordingly. It also sets the ID and file name automatically. |
| \-Drecursive | yes, no | no | Will recurse into sub-folders and index all files. |
| \-Dfiletypes | <type>\[,<type>,..\] | xml, json, csv, pdf, doc, docx, ppt, pptx,
xls, xlsx, odt, odp, ods, rtf, htm, html | Specifies the file types to consider when indexing
folders. |
| \-Dparams | "<key>=<value>\[&<key>=<value>...\]" | none | HTTP
GET params to add to the request, so you don't need to write the whole URL again. Values must
be URL-encoded. |
| \-Dcommit | yes, no | yes | Perform a commit after adding the documents.  |
| \-Doptimize | yes, no | no | Perform an optimize after adding the documents. |
| \-Dout | yes, no | no | Write the response to an output file. |

h3. Examples

There are several ways to use {{post.jar}}. Here are a few examples:

Add all documents with file extension {{.xml}}.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -jar post.jar *.xml
{code}

Send XML arguments to delete a document from the index.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Ddata=args -jar post.jar '<delete><id>42</id></delete>'
{code}

Index all CSV files.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Dtype=text/csv -jar post.jar *.csv
{code}

Index all JSON files.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Dtype=application/json -jar post.jar *.json
{code}

Use the [extracting request handler|solr:Uploading Data with Solr Cell using Apache Tika]
to index a PDF file.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Durl=[http://localhost:8983/solr/update/extract] -Dparams=literal.id=a -Dtype=application/pdf
-jar post.jar a.pdf
{code}

Automatically detect the content type based on the file extension.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Dauto=yes -jar post.jar a.pdf
{code}

Automatically detect content types in a folder, and recursively scan it for documents.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Dauto=yes -Drecursive=yes -jar post.jar afolder
{code}

Automatically detect content types in a folder, but limit it to PPT and HTML files.
{code:language=none|borderStyle=solid|borderColor=#666666}
  java -Dauto=yes -Dfiletypes=ppt,html -jar post.jar afolder
{code}

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action


    

Mime
View raw message