lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: Document serializable representation
Date Thu, 30 Mar 2017 09:09:57 GMT
I believe you can have more shards for indexing and then merge (and not
literally, but just by addIndexes() or so ) them to smaller number for
search. Transferring indices is more efficient (scp -C) than separate
tokens and their attributes over the wire.

On Thu, Mar 30, 2017 at 12:02 PM, Denis Bazhenov <dotsid@gmail.com> wrote:

> We already have done this. Many years ago :)
>
> At the moment we have 7 shards. The problem with getting more shards is
> that search become less cost effective (in terms of cluster CPU time per
> request) as you split index in more shards. Considering response time is
> good enough and the fact search nodes are ~90% of all hardware budget of
> the cluster, it’s much more cost effective to split analysis from
> IndexWriter than split index in more shards. It simply would require from
> us to put disproportionately more hardware in cluster.
>
> > On Mar 30, 2017, at 18:36, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > What you would better do is to just split your index into multiple
> shards and have separate IndexWriter instances on different machines. Those
> can act on their own. This is what Elasticsearch or Solr are doing: They
> accept the document, decide which shard they should be located and transfer
> the plain fieldname:value pairs over the network. Each node then creates
> Lucene IndexableDocuments out of it and passes to their own IndexWriter.
>
> ---
> Denis Bazhenov <dotsid@gmail.com>
>
>
>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message