lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Hull <char...@flax.co.uk>
Subject Re: [scottchu] What kind of configuration to use for this size ofnews data?
Date Wed, 11 May 2016 12:05:44 GMT
On 11/05/2016 10:55, scott.chu wrote:
>
> I just find maillist seems not accept colorful fonts (cause I receive
> my own letter from maillist and see blue colors are gone!). I use
> asterisk row to highlight my questions  and send this again.

Answers inline below.

C
>
>
>
> ----- Original Message ----- From: scott(自己) To: solr-user To: Date:
> 2016/5/11 (週三) 17:34 Subject: Re: [scottchu] What kind of
> configuration to use for this size ofnews data?
>
>
> Hi, Charlie,
>
> Thanks first for your concrete answer. I have further questions as
> written in blue color below.
>
> scott.chu,scott.chu@udngroup.com 2016/5/11 (週三) ----- Original
> Message ----- From: Charlie Hull To: solr-user@lucene.apache.org CC:
> Date: 2016/5/11 (週三) 16:21 Subject: Re: [scottchu] What kind of
> configuration to use for this size ofnews data?
>
>
> On 11/05/2016 04:27, scott.chu wrote:
>> Fix some typos, add some words and resend same question =>
>>
>> I want to build a Solr engine for over 60-year news articles. My
>> requests are (I use Solr 5.4.1):
>
> Hi Scott,
>
> We've actually done something very similar for the our client NLA
> Media Access in the UK, who handle licensing of most UK newspaper
> content. They have over 45m docs going back to 2006.
>>
>> 1> Currently over 10M no. of docs. 2> Currently over 60GB total
>> data size. 3> The no. of docs and data size will keep growing at
>> the rate of 1000 no. of docs(or 8MB size) per day. 4> There are
>> totally 5-6 different newspaper types.
>>
>> My questions are: 1> Is it wokable enough just to use master-slave
>> model? Or should I turn to SolrCloud? (I ask this due to our
>> system management group never manage a distributed system before
>> and they also have no knowedge of Zookeeper, shards, etc. Also they
>> don't know how to backup/restore distributed data.)
>
> Workable yes, advisable no. You should get much better reliability &
> performance with SolrCloud once it's set up. Also, if you have
> replication set up correctly the need for backup/restore will be
> significantly reduced and may be unnecessary.
>
> We used master-slave for News UK's Solr setup (articles from The
> Times and other papers) but this was before SolrCloud had properly
> arrived. We'd only use master-slave rarely now.
>
>
> *************************************************************************************************************************************************************
>
>
If I use SolrCloud, I know I have to setup Zookeeper. I know there're 
something called 'quorum' or 'ensemble' in Zookeeper terminologies. I 
also know there is a need for (2n+1) Zookeeper nodes per n SolrCloud 
nodes.  Is your case running one SolrCloud node per one machine (Whether 
PM or VM).  According to your experiences, how many nodes , including 
SolrCloud's and Zookeeper's, do I need to setup? Is Replication in 
SolrCloud easy to setup as that in old version? (I setup replication 
solrconfig.xml and use solrcore.properties file to setup/switch roles in 
Solr node, rather than defining role directly in solrconfig.xml)
> *************************************************************************************************************************************************************
>

You need at least 3 ZK nodes to form a quorum. How many SolrClouds you 
need will depend on how you decide to shard and replicate your data. 
There is no single answer to this - it depends on various factors 
including query load, query complexity, source data size, indexing 
strategy...you should read this page. 
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

You can run more than one Solr node per machine, but if that machine 
dies then your failover setup must be able to cope.

The *only* sensible way to figure out how many nodes you need is to try 
out a prototype system. I would guesstimate it will be less than 10 
nodes but don't hold me to that! Doing this will also teach you a lot 
about ZK and SolrCloud - you're not going to be able to avoid some 
learning here. Don't avoid looking at SolrCloud just because it involves 
ZK, the advantages outweigh the learning curve IMO.

>> 3> If I wish to create another Solr engine with one or two
>> particular paper types. Is it possible to copy their index data
>> directly from the big central Solr engine? Or I have to rebuild
>> index from raw articles data? (Our business has this possibility
>> of needs.)
>
> Yes, I guess so, but why copy it when you could just search it with
> a filter for the paper types?
>
> *************************************************************************************************************************************************************We
> have a special biz case called 'buyout newspaper search service'.
> Customers buy intranet license to use search service for articles of
> some newspaper types and some range of  publish dates, e.g. paper
> type 'A' for 2010-2012 and paper type 'B' for 2015. The buyout means
> we have to install who search service at customer site and customer
> can only use search service within their enterprise intranet
> environment. So you know, I have to build a special Solr server for
> each of such customers. Your idea of filtering is very much like
> ElasticSearch's multitenancy, which both are not fit in our buyout
> biz model. Do you have any suggestion for building Solr server in
> such condition?
> *************************************************************************************************************************************************************

You could use Solr's API to extract the subset of articles for 
papers/dates for reindexing into a new Solr core.

Best

Charlie
>>
>>
>
I'd like to hear and use some well suggestion and experiences.
>>
>> Thanks in advance and best regards.
>>
>> Scott Chu @ 2016/5/11 11:26 GMT+8
>>
>
> Hope this helps!
>
> Cheers
>
> Charlie
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Mime
View raw message