Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: <CAMJgJxTOQCCnHyHscmGWy740f6Q8aoH5RV_TggWROK0gceVjPw@mail.gmail.com>
References: <CAP_WmbvvmqdxXZRBgvJTYHp961sY7Q-6WYDb8r-W2MWjuGJLgA@mail.gmail.com>
 <89e80916-842c-b567-043a-c64ef6cd9fc9@elyograg.org> <CAP_WmbuOpS6vd7+3D8Gq1GqvhXM+04NNXE1FAhJSV79e7H7UQw@mail.gmail.com>
 <CA872F96732D0C47A33C433DAFB50A58446492DC@msgb03.nih.gov> <CAP_WmbsAYfA06vVNdoeTAjohRbzqjNJNsSU9cEF9DpEjdHrNcA@mail.gmail.com>
 <CAMJgJxQmaSRLaoZBNKrcVq0Y-KMW3q6v060MvAYT+-SFQhz9gQ@mail.gmail.com>
 <CAP_Wmbt0O2Xpx0mR1yF4g1OAt6yk4x1AgCVSOi8xMwZ5jNUT6A@mail.gmail.com> <CAMJgJxTOQCCnHyHscmGWy740f6Q8aoH5RV_TggWROK0gceVjPw@mail.gmail.com>
From: =?UTF-8?Q?Lorenzo_Fundar=C3=B3?= <lorenzo.fundaro@dawandamail.com>
Date: Thu, 7 Jul 2016 10:12:31 +0200
Message-ID: <CAP_WmbvY82Y29sPCwCg_vq3RPT5L-J5iFTyiXu+Xe5-m7Y-9dw@mail.gmail.com>
Subject: Re: deploy solr on cloud providers
To: solr-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=047d7b3a84a68d7df00537074043
archived-at: Thu, 07 Jul 2016 08:13:01 -0000

--047d7b3a84a68d7df00537074043
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thank you Tomas, I would take a thorough look to the jira ticket you're
pointing out.

On 6 July 2016 at 20:49, Tom=C3=A1s Fern=C3=A1ndez L=C3=B6bbe <tomasflobbe@=
gmail.com>
wrote:

> On Wed, Jul 6, 2016 at 2:30 AM, Lorenzo Fundar=C3=B3 <
> lorenzo.fundaro@dawandamail.com> wrote:
>
> > On 6 July 2016 at 00:00, Tom=C3=A1s Fern=C3=A1ndez L=C3=B6bbe <tomasflo=
bbe@gmail.com>
> > wrote:
> >
> > > The leader will do the replication before responding to the client, s=
o
> > lets
> > > say the leader gets to update it's local copy, but it's terminated
> before
> > > sending the request to the replicas, the client should get either an
> HTTP
> > > 500 or no http response. From the client code you can take action (lo=
g,
> > > retry, etc).
> > >
> >
> > If this true then whenever I ask for min_rf having three nodes (1 leade=
r
> +
> > 2 replicas)
> > I should get rf =3D 3, but in reality i don't.
> >
> >
> > > The "min_rf" is useful for the case where replicas may be down or not
> > > accessible. Again, you can use this for retrying or take any necessar=
y
> > > action on the client side if the desired rf is not achieved.
> > >
> >
> >
> > I think both paragraphs are contradictory. If the leader does the
> > replication before responding to the client, then
> > why is there a need to use the min_rf ? I don;t think is true that you
> get
> > a 200 when the update has been passed to all replicas.
> >
>
> The reason why "min_rf" is there is because:
> * If there are no replicas at the time of the request (e.g. if replicas a=
re
> unreachable and disconnected from ZK)
> * Replicas could fail to ACK the update request from the leader, in that
> case the leader will mark them as unhealthy but would HTTP 200 to the
> client.
>
> So, it could happen that you think your data is being replicated to 3
> replicas, but 2 of them are currently out of service, this means that you=
r
> doc is in a single host, and if that one dies, then you lose that data. I=
n
> order to prevent this, you can ask Solr to tell you how many replicas
> succeeded that update request. You can read more about this in
> https://issues.apache.org/jira/browse/SOLR-5468
>
>
> >
> > The thing is that, when you have persistent storage yo shouldn't worry
> > about this because you know when the node comes back
> > the rest of the index will be sync, the problem is when you don't have
> > persistent storage. For my particular case I have to be extra careful a=
nd
> > always
> > make sure that all my replicas have all the data I sent.
> >
> > In any case you should assume that storage on a host can be completely
> lost, no mater if you are deploying on premises or on the cloud. Consider
> that once that host comes back (could be hours later) it could be already
> out of date, and will replicate from the current leader, possibly droppin=
g
> parts or all it's current index.
>
> Tom=C3=A1s
>
>
> >
> > > Tom=C3=A1s
> > >
> > > On Tue, Jul 5, 2016 at 11:39 AM, Lorenzo Fundar=C3=B3 <
> > > lorenzo.fundaro@dawandamail.com> wrote:
> > >
> > > > @Tomas and @Steven
> > > >
> > > > I am a bit skeptical about this two statements:
> > > >
> > > > If a node just disappears you should be fine in terms of data
> > > > > availability, since Solr in "SolrCloud" replicates the data as it
> > comes
> > > > it
> > > > > (before sending the http response)
> > > >
> > > >
> > > > and
> > > >
> > > > >
> > > > > You shouldn't "need" to move the storage as SolrCloud will
> replicate
> > > all
> > > > > data to the new node and anything in the transaction log will
> already
> > > be
> > > > > distributed through the rest of the machines..
> > > >
> > > >
> > > > because according to the official documentation here
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Faul=
t+Tolerance
> > > > >:
> > > > (Write side fault tolerant -> recovery)
> > > >
> > > > If a leader goes down, it may have sent requests to some replicas a=
nd
> > not
> > > > > others. So when a new potential leader is identified, it runs a
> synch
> > > > > process against the other replicas. If this is successful,
> everything
> > > > > should be consistent, the leader registers as active, and normal
> > > actions
> > > > > proceed
> > > >
> > > >
> > > > I think there is a possibility that an update is not sent by the
> leader
> > > but
> > > > is kept in the local disk and after it comes up again it can sync t=
he
> > > > non-sent data.
> > > >
> > > > Furthermore:
> > > >
> > > > Achieved Replication Factor
> > > > > When using a replication factor greater than one, an update reque=
st
> > may
> > > > > succeed on the shard leader but fail on one or more of the
> replicas.
> > > For
> > > > > instance, consider a collection with one shard and replication
> factor
> > > of
> > > > > three. In this case, you have a shard leader and two additional
> > > replicas.
> > > > > If an update request succeeds on the leader but fails on both
> > replicas,
> > > > for
> > > > > whatever reason, the update request is still considered successfu=
l
> > from
> > > > the
> > > > > perspective of the client. The replicas that missed the update wi=
ll
> > > sync
> > > > > with the leader when they recover.
> > > >
> > > >
> > > > They have implemented this parameter called *min_rf* that you can u=
se
> > > > (client-side) to make sure that your update was replicated to at
> least
> > > one
> > > > replica (e.g.: min_rf > 1).
> > > >
> > > > This is why my concern about moving storage around, because then I
> know
> > > > when the shard leader comes back, solrcloud will run sync process f=
or
> > > those
> > > > documents that couldn't be sent to the replicas.
> > > >
> > > > Am I missing something or misunderstood the documentation ?
> > > >
> > > > Cheers !
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 5 July 2016 at 19:49, Davis, Daniel (NIH/NLM) [C] <
> > > daniel.davis@nih.gov
> > > > >
> > > > wrote:
> > > >
> > > > > Lorenzo, this probably comes late, but my systems guys just don't
> > want
> > > to
> > > > > give me real disk.   Although RAID-5 or LVM on-top of JBOD may be
> > > better
> > > > > than Amazon EBS, Amazon EBS is still much closer to real disk in
> > terms
> > > of
> > > > > IOPS and latency than NFS ;)    I even ran a mini test (not an
> > official
> > > > > benchmark), and found the response time for random reads to be
> > better.
> > > > >
> > > > > If you are a young/smallish company, this may be all in the cloud=
,
> > but
> > > if
> > > > > you are in a large organization like mine, you may also need to
> allow
> > > for
> > > > > other architectures, such as a "virtual" Netapp in the cloud that
> > > > > communicates with a physical Netapp on-premises, and the
> > > > throughput/latency
> > > > > of that.   The most important thing is to actually measure the
> > numbers
> > > > you
> > > > > are getting, both for search and for simply raw I/O, or to get yo=
ur
> > > > > systems/storage guys to measure those numbers.     If you get you=
r
> > > > > systems/storage guys to just measure storage - you will want to
> care
> > > > about
> > > > > three things for indexing primarily:
> > > > >
> > > > >         Sequential Write Throughput
> > > > >         Random Read Throughput
> > > > >         Random Read Response Time/Latency
> > > > >
> > > > > Hope this helps,
> > > > >
> > > > > Dan Davis, Systems/Applications Architect (Contractor),
> > > > > Office of Computer and Communications Systems,
> > > > > National Library of Medicine, NIH
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Lorenzo Fundar=C3=B3 [mailto:lorenzo.fundaro@dawandamail.co=
m]
> > > > > Sent: Tuesday, July 05, 2016 3:20 AM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: deploy solr on cloud providers
> > > > >
> > > > > Hi Shawn. Actually what im trying to find out is whether this is
> the
> > > best
> > > > > approach for deploying solr in the cloud. I believe solrcloud
> solves
> > a
> > > > lot
> > > > > of problems in terms of High Availability but when it comes to
> > storage
> > > > > there seems to be a limitation that can be workaround of course b=
ut
> > > it's
> > > > a
> > > > > bit cumbersome and i was wondering if there is a better option fo=
r
> > this
> > > > or
> > > > > if im missing something with the way I'm doing it. I wonder if
> there
> > > are
> > > > > some proved experience about how to solve the storage problem whe=
n
> > > > > deploying in the cloud. Any advise or point to some enlightening
> > > > > documentation will be appreciated. Thanks.
> > > > > On Jul 4, 2016 18:27, "Shawn Heisey" <apache@elyograg.org> wrote:
> > > > >
> > > > > > On 7/4/2016 10:18 AM, Lorenzo Fundar=C3=B3 wrote:
> > > > > > > when deploying solr (in solrcloud mode) in the cloud one has =
to
> > > take
> > > > > > > care of storage, and as far as I understand it can be a probl=
em
> > > > > > > because the storage should go wherever the node is created. I=
f
> we
> > > > > > > have for example, a node on EC2 with its own persistent disk,
> > this
> > > > > > > node happens to be the leader and at some point crashes but
> > > couldn't
> > > > > > > make the replication of the data that has in the transaction
> log,
> > > > > > > how do we do in that case ? Ideally the new node must use the
> > > > > > > leftover data that the death node left, but this is a bit
> > > cumbersome
> > > > > > > in my opinion. What are the best practices for this ?
> > > > > >
> > > > > > I can't make any sense of this.  What is the *exact* problem yo=
u
> > need
> > > > > > to solve?  The details can be very important.
> > > > > >
> > > > > > We might be dealing with this:
> > > > > >
> > > > > > http://people.apache.org/~hossman/#xyproblem
> > > > > >
> > > > > > Thanks,
> > > > > > Shawn
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > --
> > > > Lorenzo Fundaro
> > > > Backend Engineer
> > > > E-Mail: lorenzo.fundaro@dawandamail.com
> > > >
> > > > Fax       + 49 - (0)30 - 25 76 08 52
> > > > Tel        + 49 - (0)179 - 51 10 982
> > > >
> > > > DaWanda GmbH
> > > > Windscheidstra=C3=9Fe 18
> > > > 10627 Berlin
> > > >
> > > > Gesch=C3=A4ftsf=C3=BChrer: Claudia Helming und Niels N=C3=BCssler
> > > > AG Charlottenburg HRB 104695 B http://www.dawanda.com
> > > >
> > >
> >
> >
> >
> > --
> >
> > --
> > Lorenzo Fundaro
> > Backend Engineer
> > E-Mail: lorenzo.fundaro@dawandamail.com
> >
> > Fax       + 49 - (0)30 - 25 76 08 52
> > Tel        + 49 - (0)179 - 51 10 982
> >
> > DaWanda GmbH
> > Windscheidstra=C3=9Fe 18
> > 10627 Berlin
> >
> > Gesch=C3=A4ftsf=C3=BChrer: Claudia Helming und Niels N=C3=BCssler
> > AG Charlottenburg HRB 104695 B http://www.dawanda.com
> >
>


--=20

--=20
Lorenzo Fundaro
Backend Engineer
E-Mail: lorenzo.fundaro@dawandamail.com

Fax       + 49 - (0)30 - 25 76 08 52
Tel        + 49 - (0)179 - 51 10 982

DaWanda GmbH
Windscheidstra=C3=9Fe 18
10627 Berlin

Gesch=C3=A4ftsf=C3=BChrer: Claudia Helming und Niels N=C3=BCssler
AG Charlottenburg HRB 104695 B http://www.dawanda.com

--047d7b3a84a68d7df00537074043--