lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Austina <olivier.aust...@gmail.com>
Subject Re: Indexing documents/files for production use
Date Thu, 30 Oct 2014 22:10:16 GMT
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me.

Regards
Olivier


2014-10-28 23:35 GMT+01:00 Erick Erickson <erickerickson@gmail.com>:

> And one other consideration in addition to the two excellent responses
> so far....
>
> In a SolrCloud environment, SolrJ via CloudSolrServer will automatically
> route the documents to the correct shard leader, saving some additional
> overhead. Post.jar and cURL send the docs to a node, which in turn
> forward the docs to the correct shard leader which lowers
> throughput....
>
> Best,
> Erick
>
> On Tue, Oct 28, 2014 at 2:32 PM, "Jürgen Wagner (DVT)"
> <juergen.wagner@devoteam.com> wrote:
> > Hello Olivier,
> >   for real production use, you won't really want to use any toys like
> > post.jar or curl. You want a decent connector to whatever data source
> there
> > is, that fetches data, possibly massages it a bit, and then feeds it into
> > Solr - by means of SolrJ or directly into the web service of Solr via
> binary
> > protocols. This way, you can properly handle incremental feeding,
> processing
> > of data from remote locations (with the connector being closer to the
> data
> > source), and also source data security. Also think about what happens if
> you
> > do processing of incoming documents in Solr. What happens if Tika runs
> out
> > of memory because of PDF problems? What if this crashes your Solr node?
> In
> > our Solr projects, we generally do not do any sizable processing within
> Solr
> > as document processing and document indexing or querying have all
> different
> > scaling properties.
> >
> > "Production use" most typically is not achieved by deploying a vanilla
> Solr,
> > but rather having a bit more glue and wrappage, so the whole will fit
> your
> > requirements in terms of functionality, scaling, monitoring and
> robustness.
> > Some similar platforms like Elasticsearch try to alleviate these pains of
> > going to a production-style infrastructure, but that's at the expense of
> > flexibility and comes with limitations.
> >
> > For proof-of-concept or demonstrator-style applications, the plain tools
> out
> > of the box will be fine. For production applications, you want to have
> more
> > robust components.
> >
> > Best regards,
> > --Jürgen
> >
> >
> > On 28.10.2014 22:12, Olivier Austina wrote:
> >
> > Hi All,
> >
> > I am reading the solr documentation. I have understood that post.jar
> > <
> http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29
> >
> > is not meant for production use, cURL
> > <
> https://cwiki.apache.org/confluence/display/solr/Introduction+to+Solr+Indexing
> >
> > is not recommanded. Is SolrJ better for production?  Thank you.
> > Regards
> > Olivier
> >
> >
> >
> > --
> >
> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> > уважением
> > i.A. Jürgen Wagner
> > Head of Competence Center "Intelligence"
> > & Senior Cloud Consultant
> >
> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
> 1543
> > E-Mail: juergen.wagner@devoteam.com, URL: www.devoteam.de
> >
> > ________________________________
> > Managing Board: Jürgen Hatzipantelis (CEO)
> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message