cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wateray <wate...@163.com>
Subject Streamsession's timeout is not reasonable
Date Tue, 01 Dec 2015 08:50:30 GMT


As the preview message, see below, after some hours rebuild failure, we found it is due to
timeout.
The transfer side incoming socket read timeout( as streaming_socket_timeout_in_ms  default
one hours), then the whole streamsession fail.


As rebuild going the transfer rate will slow down, the transferring file can't accomplish
in the timeout time. The transfer side didn't receive any byte (expected RECEIVED message),
then the incoming socket raised timeout.


 As incoming and outgoing belong to the streamsession, To determine timeout,we can't test
incoming alone, as outgoing is streaming(transferring file is continue especially large file,
low speed). In other words, when file is transferring, we can't raise timeout.


Question again:
  Will re-rebuild rebuild all rang of tokens which belong to the node or just rebuild the
rest rang of tokens from last rebuild.(since last rebuild we get some data).

Please excuse me for my poor English.



===========================================================================
At 2015-11-21 01:07:05, "wateray" <wateray@163.com> wrote:
>we want deploy one more data-center for data safe.
>As we rebuild one node's data from the old DC, after some hours rebuild failure due to
network fault.
>I can restart rebuild surely,but I'm afraid restart rebuild,
>is it rebuild all rang of tokens which belong to the node or just rebuild the rest rang
of tokens from last rebuild.(since last rebuild we get some data).
>
>As I view the source, I see this code.
>
>class RangeStreamer method getRangeFetchMap
>
>private static Multimap<InetAddress, Range<Token>> getRangeFetchMap(Multimap<Range<Token>,
InetAddress> rangesWithSources, Collection<ISourceFilter> sourceFilters, String keyspace)
>    {
>        Multimap<InetAddress, Range<Token>> rangeFetchMapMap = HashMultimap.create();
>        for (Range<Token> range : rangesWithSources.keySet())
>        {
>            boolean foundSource = false;
>
>            outer:
>            for (InetAddress address : rangesWithSources.get(range))
>            {
>                if (address.equals(FBUtilities.getBroadcastAddress()))
>                {
>                    // If localhost is a source, we have found one, but we don't add it
to the map to avoid streaming locally
>                    foundSource = true;
>                    continue;
>                }
>
>                for (ISourceFilter filter : sourceFilters)
>                {
>                    if (!filter.shouldInclude(address))
>                        continue outer;
>                }
>
>                rangeFetchMapMap.put(address, range);
>                foundSource = true;
>                break; // ensure we only stream from one other node for each range
>            }
>
>            if (!foundSource)
>                throw new IllegalStateException("unable to find sufficient sources for
streaming range " + range + " in keyspace " + keyspace);
>        }
>
>        return rangeFetchMapMap;
>    }
>
>The bold lines ,when found the address is localhost, It continue to find others and
then put into the rangeFetchMapMap。
>I think the continue key word should be break, if it just want rebuild the data it doesn't
have. Is it right?
>
>
>Best regards!
>
>
>
>
>
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message