lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Setting up to index multiple datastores
Date Fri, 03 Mar 2017 01:44:50 GMT
And if you are not using SolrCloud, you can have
collection=shard=core, so the terminology gets confused. But you can
definitely have many cores on one mail server. You can also make them
lazy, so not all cores have to be loaded. That would definitely allow
you to have a core per user and only searched cores would be loaded.
And relevance might be a bit better too, as each user will get their
own term counts.

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 2 March 2017 at 20:14, Shawn Heisey <apache@elyograg.org> wrote:
> On 3/2/2017 2:58 PM, Daniel Miller wrote:
>> One of the many features of the Dovecot IMAP server is Solr support.
>> This obviously provides full-text-searching of stored mails - and it
>> works great.  But...the focus of the Dovecot team and mailing list is
>> Dovecot configuration.  I'm asking for some guidance on how I might
>> optimize Solr.
>
> I use Solr for work.  I use Dovecot for personal domains.  I have not
> used them together.  I probably should -- my personal mailbox is many
> gigabytes and would benefit from a boost in search performance.
>
>> At the moment I have a (I think!) reasonably well-defined schema that
>> seems to perform well.  In my particular use case, I have a single
>> physical server running Linux with available VirtualBox virtual
>> servers.  I am presently running Solr within one of the virtual
>> servers, and I'm running SolrCloud even though I only have one core
>> (it just seemed to work better).
>>
>> Now because I have a single collection/core/shard - all the mail users
>> and all their mail folders are stored/indexed/searched by this single
>> Solr instance.  I'm thinking that I'd like to split the indexing on at
>> least a per-user fashion - possibly also on a per-mailbox fashion.
>> Dovecot does allow for variable substitution in the Solr URL - so I
>> should be able to generate the necessary URL requests on the Dovecot
>> side.  What I don't know is:
>>
>> 1.  Is it possible to split the "indexes" (I'm still learning Solr
>> vocabulary) without creating separate "cores" (which to me means
>> separate Java instances)?
>> 2.  Can these separate "indexes" be created on-demand - or do they
>> need to be explictly created prior to use?
>
> Here's a paragraph that hopefully clears up most confusion about Solr
> terminology.  This is applicable to SolrCloud:
>
> Collections are made up of one or more shards.  Shards are made up of
> one or more replicas.  Each replica is a core.  One replica from each
> shard is elected as the leader of that shard, and if there are multiple
> replicas, the leader role can move between them in response to a change
> in cluster state.
>
> Further info: One Solr instance (JVM) can handle many cores.  SolrCloud
> allows multiple Solr instances to coordinate with each other (via
> ZooKeeper) and form a whole cluster.  Without SolrCloud, you have cores,
> but no collections and no replicas.  Sharding is possible without
> SolrCloud, but is handled mostly manually.  Replication is possible
> without SolrCloud, but works very differently, and has a single point of
> failure due to the fact that switching master servers isn't something
> that's done easily.  SolrCloud is a true cluster, no masters or slaves.
>
> https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
> https://cwiki.apache.org/confluence/display/solr/Index+Replication
>
> SolrCloud also makes it VERY easy to create new collections (logical
> indexes) if the desired index config is already in the zookeeper
> database.  It can be done entirely with an HTTP request:
>
> https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
>
> One thing to note:  SolrCloud begins to have performance issues when the
> number of collections in the cloud reaches the low hundreds.  It's not
> going to scale very well with a collection per user or per mailbox
> unless there aren't very many users.  There are people looking into how
> to scale better, but this hasn't really gone anywhere yet.  Here's one
> issue about it, with a lot of very dense comments:
>
> https://issues.apache.org/jira/browse/SOLR-7191
>
> Thanks,
> Shawn
>

Mime
View raw message