lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Stand-alone Index updating using EmbeddedSolrServer
Date Thu, 21 Apr 2011 23:47:51 GMT
On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond <kiko@alum.mit.edu> wrote:
> Yes, I've seen that page, but I went a bit beyond the material there, as the
> code I wrote is able to set parameters such as separators, encapsulators and
> the index columns,  whether to split parameters, auto-commit as well as the
> ability to do incremental or full index reloads.

Is this a CSV loader?
If so, did you know the CSV loader (and other data loaders) have the
option to bypass HTTP also and stream directly from a local file (or
other URL)?

> Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have
> to supply the document body as a String.  We want to avoid havindgto load
> the entire document into memory, which is why we load the files into
> ContentStream objects and pass them to the embedded Solr server (I am
> assuming  ContentStream actually streams the file as its name suggests
> instead of trying to load it into memory).  The utility I wrote gets a path,
> a Regex expression for all the files to be loaded, as well as the parameters
> mentioned above and it does either a full or incremental upload of multiple
> files with a single command.
>
> We run a very high load application with SOLR in the back end that requires
> that we use the Embedded solr server to eliminate the network round-trip.
> Even a small incremental gain in performance is important for us.

Eliminating the network round-trip is certainly important for good
bulk indexing performance.  Luckily you don't have to
embed to do that.  You can use multiple threads (say 16 for a 4 core
server) that essentially covers up
any round-trip latency (use persistent connections though!  or use
SolrJ which does by default),
or you can use the StreamingUpdateSolrServer that eliminates
round-trip network delays
by streaming documents over multiple already open connections.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message