kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Exception at inserting big amount of data
Date Wed, 27 Apr 2016 07:31:42 GMT
Hi Juan,

I see evidence of one issue in your log:

The 'master' server has errors about missing blocks across many of the
tablets. Is it possible that one of the drives hosting Kudu data got
unmounted or accidentally removed? Or perhaps the set of data directories
was changed after Kudu had been in use for a while?

I think this might be unrelated to the issue you're seeing, though --
according to the metrics, it's tablet '027fbba' which you're trying to
write to, but that doesn't seem to have any replicas on the node 'master'.

In terms of the tablet that is seeing writes, the odd thing is that the log
and metrics indicate that the writes are proceeding quite fast:
                "name":
"handler_latency_kudu_tserver_TabletServerService_Write",
                "total_count": 4891,
                "min": 28,
                "mean": 222.407,
                "percentile_75": 82,
                "percentile_95": 119,
                "percentile_99": 1936,
                "percentile_99_9": 20608,
                "percentile_99_99": 25728,
                "max": 25855,
                "total_sum": 1087794
            },

So there was no write operation which took longer than 26ms.

The other red flag in the log is the following error:
W0426 11:57:11.404397 11836 connection.cc:140] Shutting down connection
server connection from 10.0.6.6:39313 with pending inbound data (4/8575988
bytes received, last active 0 ns ago, status=Network error: the frame had a
length of 8575988, but we only support messages up to 8388608 bytes long.)

which is plausibly because the batches coming from the client are too
large. It's possible that the Java client doesn't check its batch size
before sending RPCs, and this is causing the server side to disconnect the
client.

Can you try either (a) run the tservers with the flag
--rpc_max_message_size=16777216 or (b) change the size of your manual
batches to be a bit smaller?

This is obviously an area that we need to make more diagnosable, so thanks
for reporting the issue.

-Todd





On Tue, Apr 26, 2016 at 11:26 AM, Juan Pablo Briganti <
juan.briganti@globant.com> wrote:

> Hi Jean-Daniel
>
> Thanks for your response.
> As you said, master node has both roles: master and tablet server.
> I attach the log and metrics for both servers. Do not pay attention to
> server's time even if they don't match, I extracted both from a completely
> new run.
> If there is any problem with log format or uploaded files please let me
> know and I'll try to generate again.
> Let me add that, if I try to insert small amount of data (10-20
> registers), It works ok.
>
> Thanks again.
>
>
> Hi Juan Pablo,
>
> The error basically means that the client didn't hear from the server
> after sending the data, even after retrying a few times, and reached the
> default 10 seconds timeout. Can you run your insert again and then capture
> the output of this command?
>
> curl -s http://10.0.6.157:8050/metrics | gzip - > metrics.gz
>
> Then post that file somewhere we can download. I you have more than one
> tablet server, it might be a different node, basically I want the one that
> ends up listed in this exception on the right:
>
> Caused by: org.kududb.client.ConnectionResetException: [Peer
> f7e2936b040d4c58b52d90ae50ad6d5a] Connection reset on [id: 0x323019c2, /
> 10.0.6.6:58930 :> /10.0.6.157:7050]
>
> Also, can we see the logs from that node around 10AM on 16/04/26?
>
> Finally, I'm surprised you're even able to create your table if you only
> have one tablet server and a replication of 2 (unless you meant to say that
> your master node has both a master and a tablet server).
>
> J-D
> --
>
> The information contained in this e-mail may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution or
> copying of this communication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message