kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Pablo Briganti <juan.briga...@globant.com>
Subject Re: Exception at inserting big amount of data
Date Wed, 27 Apr 2016 18:33:18 GMT
Thanks for your answer Todd.

At the moment using AUTO_FLUSH_BACKGROUND solved the error, so we are going
with that option until we find out which one represent better performance
to us (auto flush or manual flush + message size or manual flush + smaller
batches).
Thanks for your help, we will contact again if we have new information or
problems.
Keep up the good work!

- Juan Pablo.

2016-04-27 4:31 GMT-03:00 Todd Lipcon <todd@cloudera.com>:

> Hi Juan,
>
> I see evidence of one issue in your log:
>
> The 'master' server has errors about missing blocks across many of the
> tablets. Is it possible that one of the drives hosting Kudu data got
> unmounted or accidentally removed? Or perhaps the set of data directories
> was changed after Kudu had been in use for a while?
>
> I think this might be unrelated to the issue you're seeing, though --
> according to the metrics, it's tablet '027fbba' which you're trying to
> write to, but that doesn't seem to have any replicas on the node 'master'.
>
> In terms of the tablet that is seeing writes, the odd thing is that the
> log and metrics indicate that the writes are proceeding quite fast:
>                 "name":
> "handler_latency_kudu_tserver_TabletServerService_Write",
>                 "total_count": 4891,
>                 "min": 28,
>                 "mean": 222.407,
>                 "percentile_75": 82,
>                 "percentile_95": 119,
>                 "percentile_99": 1936,
>                 "percentile_99_9": 20608,
>                 "percentile_99_99": 25728,
>                 "max": 25855,
>                 "total_sum": 1087794
>             },
>
> So there was no write operation which took longer than 26ms.
>
> The other red flag in the log is the following error:
> W0426 11:57:11.404397 11836 connection.cc:140] Shutting down connection
> server connection from 10.0.6.6:39313 with pending inbound data
> (4/8575988 bytes received, last active 0 ns ago, status=Network error: the
> frame had a length of 8575988, but we only support messages up to 8388608
> bytes long.)
>
> which is plausibly because the batches coming from the client are too
> large. It's possible that the Java client doesn't check its batch size
> before sending RPCs, and this is causing the server side to disconnect the
> client.
>
> Can you try either (a) run the tservers with the flag
> --rpc_max_message_size=16777216 or (b) change the size of your manual
> batches to be a bit smaller?
>
> This is obviously an area that we need to make more diagnosable, so thanks
> for reporting the issue.
>
> -Todd
>
>
>
>
>
> On Tue, Apr 26, 2016 at 11:26 AM, Juan Pablo Briganti <
> juan.briganti@globant.com> wrote:
>
>> Hi Jean-Daniel
>>
>> Thanks for your response.
>> As you said, master node has both roles: master and tablet server.
>> I attach the log and metrics for both servers. Do not pay attention to
>> server's time even if they don't match, I extracted both from a completely
>> new run.
>> If there is any problem with log format or uploaded files please let me
>> know and I'll try to generate again.
>> Let me add that, if I try to insert small amount of data (10-20
>> registers), It works ok.
>>
>> Thanks again.
>>
>>
>> Hi Juan Pablo,
>>
>> The error basically means that the client didn't hear from the server
>> after sending the data, even after retrying a few times, and reached the
>> default 10 seconds timeout. Can you run your insert again and then capture
>> the output of this command?
>>
>> curl -s http://10.0.6.157:8050/metrics | gzip - > metrics.gz
>>
>> Then post that file somewhere we can download. I you have more than one
>> tablet server, it might be a different node, basically I want the one that
>> ends up listed in this exception on the right:
>>
>> Caused by: org.kududb.client.ConnectionResetException: [Peer
>> f7e2936b040d4c58b52d90ae50ad6d5a] Connection reset on [id: 0x323019c2, /
>> 10.0.6.6:58930 :> /10.0.6.157:7050]
>>
>> Also, can we see the logs from that node around 10AM on 16/04/26?
>>
>> Finally, I'm surprised you're even able to create your table if you only
>> have one tablet server and a replication of 2 (unless you meant to say that
>> your master node has both a master and a tablet server).
>>
>> J-D
>> --
>>
>> The information contained in this e-mail may be confidential. It has been
>> sent for the sole use of the intended recipient(s). If the reader of this
>> message is not an intended recipient, you are hereby notified that any
>> unauthorized review, use, disclosure, dissemination, distribution or
>> copying of this communication, or any of its contents,
>> is strictly prohibited. If you have received it by mistake please let us
>> know by e-mail immediately and delete it from your system. Many thanks.
>>
>>
>>
>> La información contenida en este mensaje puede ser confidencial. Ha sido
>> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
>> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
>> notificado que cualquier lectura, uso, publicación, diseminación,
>> distribución o copiado de esta comunicación o su contenido está
>> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
>> por error le agradeceremos notificarnos por e-mail inmediatamente y
>> eliminarlo de su sistema. Muchas gracias.
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
*Juan Pablo Briganti* | Data Architect
*GLOBANT* | AR: +54 11 4109 1700 ext. 19508 | US: +1 877 215 5230 ext. 19508
|
[image: Facebook] <https://www.facebook.com/Globant> [image: Twitter]
<http://www.twitter.com/globant> [image: Youtube]
<http://www.youtube.com/Globant> [image: Linkedin]
<http://www.linkedin.com/company/globant> [image: Pinterest]
<http://pinterest.com/globant/> [image: Globant] <http://www.globant.com>

-- 


The information contained in this e-mail may be confidential. It has been 
sent for the sole use of the intended recipient(s). If the reader of this 
message is not an intended recipient, you are hereby notified that any 
unauthorized review, use, disclosure, dissemination, distribution or 
copying of this communication, or any of its contents, 
is strictly prohibited. If you have received it by mistake please let us 
know by e-mail immediately and delete it from your system. Many thanks.

 

La información contenida en este mensaje puede ser confidencial. Ha sido 
enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de 
este mensaje no fuera el destinatario previsto, por el presente queda Ud. 
notificado que cualquier lectura, uso, publicación, diseminación, 
distribución o copiado de esta comunicación o su contenido está 
estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje 
por error le agradeceremos notificarnos por e-mail inmediatamente y 
eliminarlo de su sistema. Muchas gracias.


Mime
View raw message