lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Edit Example Post.jar to read ALL file types
Date Tue, 22 Jul 2014 00:41:19 GMT
So how do you expect these to be indexed? I mean what happens
if you run across a Word document? How about an mp3? Just
blasting all files up seems chancy. And doesn't just
'java -jar post.jar * ' do what you ask?

This seems like an XY problem, _why_ do you want
to do this? Because unless the files being sent to Solr are
properly formatted, they won't be ingested. There's some special
logic that handles XML file and expects the very precise Solr
format.... Solr would have no idea what to do with the
extensions in your example.

Perhaps a better approach would be to control the indexing
from a SolrJ client. Here's a blog if you want to follow
that approach.

Best,
Erick


On Mon, Jul 21, 2014 at 7:51 AM, jrusnak <jrusnak@live.unc.edu> wrote:

> I am working with Solr 4.8.1 to set up an enterprise search system.
>
> The file system I am working with has numerous files with unique extension
> types (ex .20039 .20040 .20041 etc.)
>
> I am using the post.jar file included in the binary download (src:
> SimplePostTool.java
> <
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SimplePostTool.java
> >
> )to post these files to the solr server and would like to edit this jar
> file
> to recognize /any/ file extension it comes across.
>
> Is there a way to do this with the SimplePostTool.java source? I am right
> now working to better understand the Filetype and DEFAULT_FILE_TYPE
> variables as well as the mimeMap. It is these that currently allow me to
> manually add file extensions.
>
> I would however, like the tool to be able to read in files no matter what
> they extension was and default their mime type to text/plain.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message