lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <>
Subject Re: What is the correct URL for POSTing new data?
Date Sun, 15 Apr 2018 20:24:46 GMT
Hash: SHA256


On 4/13/18 6:02 PM, Shawn Heisey wrote:
> On 4/13/2018 7:49 AM, Christopher Schultz wrote:
>> $
>>" /usr/local/solr/bin/post
>> -c new_core https://localhost:8983/solr/new_core
>> [time passes while bin/post uploads a very large file]
>> SimplePostTool version 5.0.0 Posting files to [base] url
>> https://localhost:8983/solr/new_core... Entering auto mode. File
>> endings considered are 
>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp
POSTing file new_core.json (application/json) to [base]/json/docs
>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found)
>> for url: https://localhost:8983/solr/new_core/json/docs
> The URL path (beyond the core name) it's ending up with is
> /json/docs, when it should be /update/json/docs.

Looks like that worked. I could find that nowhere in the documentation.

> If you hadn't given the command a specific URL, it probably would
> have figured out the correct URL on its own.

No, it wouldn't have. It doesn't read any configuration files and
guesses its way through everything. Simply adding HTTPS support
required me to modify the script and manually-specify the URL. That's
why I went through the trouble of explaining so in my initial post.

> The base URL for the post tool normally includes the /update path, 
> which is different than the base URL for something like 
> HttpSolrClient (in the SolrJ library).  Changing the handler path
> is done differently in SolrJ than it is with the post tool.
> I know, we've violated that principle again. :)


I don't mind all surprises. It's the ones that have zero documentation
that are the most surprising.

> The bin/post tool is a *simple* tool.  The java class that it calls
> is even named "SimplePostTool".  It is expected that most users
> will outgrow its functionality quickly and write their own indexing
> software that does whatever custom processing they require.  The
> tool doesn't get a lot of improvements because we don't intend it
> to be used as a production indexing mechanism.

I'm using it as a bulk-loading operation. I have no need in production
to completely bootstrap a document collection unless the existing one
has been trashed for some reason. Why bother writing my own client
that does the equivalent of "SELECT * FROM table" and then loop over
the ResultSet calling SolrJ's add-document method.

The SimplePostTool should be able to handle that for me, and if it
did, I'd have less code to babysit in perpetuity.

> If it does what you need, there's nothing wrong with production 
> usage, but you need to be aware that it doesn't have robust error 
> handling, which is usually pretty important for production.
I'm okay with terse error messages.

- -chris
Comment: GPGTools -
Comment: Using GnuPG with Thunderbird -


View raw message