crail-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Crespi <david.cre...@storedgesystems.com>
Subject RE: Setting up storage class 1 and 2
Date Mon, 01 Jul 2019 22:57:42 GMT
A standard pull from the repo, one that didn’t have the patches from your private repo.

I can put patches back in both the client and server containers if you really think it

would make a difference.



Are you guys running multiple types together?  I’m running a RDMA storage class 1,

a NVMf Storage Class 1 and NVMf Storage Class 2 together.  I get errors when the

RDMA is introduced into the mix.  I have a small amount of memory (4GB) assigned

with the RDMA tier, and looking for it to fall into the NVMf class 1 tier.  It appears to
want

to do that, but gets screwed up… it looks like it’s trying to create another set of qp’s
for

an RDMA connection.  It even blew up spdk trying to accomplish that.



Do you guys have some documentation that shows what’s been tested (mixes/variations) so
far?



Regards,



           David





________________________________
From: Jonas Pfefferle <pepperjo@japf.ch>
Sent: Monday, July 1, 2019 12:51:09 AM
To: dev@crail.apache.org; David Crespi
Subject: Re: Setting up storage class 1 and 2

Hi David,


Can you clarify which unpatched version you are talking about? Are you
talking about the NVMf thread fix where I send you a link to a branch in my
repository or the fix we provided earlier for the Spark hang in the Crail
master?

Generally, if you update, update all: clients and datanode/namenode.

Regards,
Jonas

  On Fri, 28 Jun 2019 17:59:32 +0000
  David Crespi <david.crespi@storedgesystems.com> wrote:
> Jonas,
>FYI - I went back to using the unpatched version of crail on the
>clients and it appears to work
> okay now with the shuffle and RDMA, with only the RDMA containers
>running on the server.
>
> Regards,
>
>           David
>
>
> ________________________________
>From: David Crespi
> Sent: Friday, June 28, 2019 7:49:51 AM
> To: Jonas Pfefferle; dev@crail.apache.org
> Subject: RE: Setting up storage class 1 and 2
>
>
> Oh, and while I’m thinking about it Jonas, when I added the patches
>you provided the other day, I only
>
> added them to the spark containers (clients) not to my crail
>containers running on my storage server.
>
> Should the patches been added to all of the containers?
>
>
> Regards,
>
>
>           David
>
>
> ________________________________
>From: Jonas Pfefferle <pepperjo@japf.ch>
> Sent: Friday, June 28, 2019 12:54:27 AM
> To: dev@crail.apache.org; David Crespi
> Subject: Re: Setting up storage class 1 and 2
>
> Hi David,
>
>
> At the moment, it is possible to add a NVMf datanode even if only
>the RDMA
> storage type is specified in the config. As you have seen this will
>go wrong
> as soon as a client tries to connect to the datanode. Make sure to
>start the
> RDMA datanode with the appropriate classname, see:
> https://incubator-crail.readthedocs.io/en/latest/run.html
> The correct classname is
>org.apache.crail.storage.rdma.RdmaStorageTier.
>
> Regards,
> Jonas
>
>  On Thu, 27 Jun 2019 23:09:26 +0000
>  David Crespi <david.crespi@storedgesystems.com> wrote:
>> Hi,
>> I’m trying to integrate the storage classes and I’m hitting another
>>issue when running terasort and just
>> using the crail-shuffle with HDFS as the tmp storage.  The program
>>just sits, after the following
>> message:
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>> During this run, I’ve removed the two crail nvmf (class 1 and 2)
>>containers from the server, and I’m only running
>> the namenode and a rdma storage class 1 datanode.  My spark
>>configuration is also now only looking at
>> the rdma class.  It looks as though it’s picking up the NVMf IP and
>>port in the INFO messages seen below.
>> I must be configuring something wrong, but I’ve not been able to
>>track it down.  Any thoughts?
>>
>>
>> ************************************
>>         TeraSort
>> ************************************
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/jnvmf-1.6-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/crail/jars/disni-2.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>>[jar:file:/usr/spark-2.4.2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 19/06/27 15:59:07 WARN NativeCodeLoader: Unable to load
>>native-hadoop library for your platform... using builtin-java classes
>>where applicable
>> 19/06/27 15:59:07 INFO SparkContext: Running Spark version 2.4.2
>> 19/06/27 15:59:07 INFO SparkContext: Submitted application: TeraSort
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls to:
>>hduser
>> 19/06/27 15:59:07 INFO SecurityManager: Changing view acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: Changing modify acls groups
>>to:
>> 19/06/27 15:59:07 INFO SecurityManager: SecurityManager:
>>authentication disabled; ui acls disabled; users  with view
>>permissions: Set(hduser); groups with view permissions: Set(); users
>> with modify permissions: Set(hduser); groups with modify
>>permissions: Set()
>> 19/06/27 15:59:08 DEBUG InternalLoggerFactory: Using SLF4J as the
>>default logging framework
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
>> 19/06/27 15:59:08 DEBUG InternalThreadLocalMap:
>>-Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
>> 19/06/27 15:59:08 DEBUG MultithreadEventLoopGroup:
>>-Dio.netty.eventLoopThreads: 112
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: -Dio.netty.noUnsafe:
>>false
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: Java version: 8
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.theUnsafe: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>sun.misc.Unsafe.copyMemory: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Buffer.address:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: direct buffer
>>constructor: available
>> 19/06/27 15:59:08 DEBUG PlatformDependent0: java.nio.Bits.unaligned:
>>available, true
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable
>>prior to Java9
>> 19/06/27 15:59:08 DEBUG PlatformDependent0:
>>java.nio.DirectByteBuffer.<init>(long, int): available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: sun.misc.Unsafe:
>>available
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp
>>(java.io.tmpdir)
>> 19/06/27 15:59:08 DEBUG PlatformDependent: -Dio.netty.bitMode: 64
>>(sun.arch.data.model)
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.noPreferDirect: false
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.maxDirectMemory: 1029177344 bytes
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>-Dio.netty.uninitializedArrayAllocationThreshold: -1
>> 19/06/27 15:59:08 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner():
>>available
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.noKeySetOptimization: false
>> 19/06/27 15:59:08 DEBUG NioEventLoop:
>>-Dio.netty.selectorAutoRebuildThreshold: 512
>> 19/06/27 15:59:08 DEBUG PlatformDependent:
>>org.jctools-core.MpscChunkedArrayQueue: available
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.level: simple
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetector:
>>-Dio.netty.leakDetection.targetRecords: 4
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numHeapArenas: 9
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.numDirectArenas: 10
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.pageSize: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxOrder: 11
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.chunkSize: 16777216
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.tinyCacheSize: 512
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.smallCacheSize: 256
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.normalCacheSize: 64
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.maxCachedBufferCapacity: 32768
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.cacheTrimInterval: 8192
>> 19/06/27 15:59:08 DEBUG PooledByteBufAllocator:
>>-Dio.netty.allocator.useCacheForAllThreads: true
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.processId: 2236
>>(auto-detected)
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
>> 19/06/27 15:59:08 DEBUG NetUtil: -Djava.net.preferIPv6Addresses:
>>false
>> 19/06/27 15:59:08 DEBUG NetUtil: Loopback interface: lo (lo,
>>127.0.0.1)
>> 19/06/27 15:59:08 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 128
>> 19/06/27 15:59:08 DEBUG DefaultChannelId: -Dio.netty.machineId:
>>02:42:ac:ff:fe:1b:00:02 (auto-detected)
>> 19/06/27 15:59:08 DEBUG ByteBufUtil: -Dio.netty.allocator.type:
>>pooled
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.threadLocalDirectBufferSize: 65536
>> 19/06/27 15:59:08 DEBUG ByteBufUtil:
>>-Dio.netty.maxThreadLocalCharBufferSize: 16384
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 36915
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'sparkDriver' on port 36915.
>> 19/06/27 15:59:08 DEBUG SparkEnv: Using serializer: class
>>org.apache.spark.serializer.KryoSerializer
>> 19/06/27 15:59:08 INFO SparkEnv: Registering MapOutputTracker
>> 19/06/27 15:59:08 DEBUG MapOutputTrackerMasterEndpoint: init
>> 19/06/27 15:59:08 INFO CrailShuffleManager: crail shuffle started
>> 19/06/27 15:59:08 INFO SparkEnv: Registering BlockManagerMaster
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Using
>>org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>information
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint:
>>BlockManagerMasterEndpoint up
>> 19/06/27 15:59:08 INFO DiskBlockManager: Created local directory at
>>/tmp/blockmgr-15237510-f459-40e3-8390-10f4742930a5
>> 19/06/27 15:59:08 DEBUG DiskBlockManager: Adding shutdown hook
>> 19/06/27 15:59:08 INFO MemoryStore: MemoryStore started with
>>capacity 366.3 MB
>> 19/06/27 15:59:08 INFO SparkEnv: Registering OutputCommitCoordinator
>> 19/06/27 15:59:08 DEBUG
>>OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
>> 19/06/27 15:59:08 DEBUG SecurityManager: Created SSL options for ui:
>>SSLOptions{enabled=false, port=None, keyStore=None,
>>keyStorePassword=None, trustStore=None, trustStorePassword=None,
>>protocol=None, enabledAlgorithms=Set()}
>> 19/06/27 15:59:08 INFO Utils: Successfully started service 'SparkUI'
>>on port 4040.
>> 19/06/27 15:59:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and
>>started at http://192.168.1.161:4040
>> 19/06/27 15:59:08 INFO SparkContext: Added JAR
>>file:/spark-terasort/target/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>at
>>spark://master:36915/jars/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>>with timestamp 1561676348562
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint:
>>Connecting to master spark://master:7077...
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Creating new
>>connection to master/192.168.3.13:7077
>> 19/06/27 15:59:08 DEBUG AbstractByteBuf:
>>-Dio.netty.buffer.bytebuf.checkAccessible: true
>> 19/06/27 15:59:08 DEBUG ResourceLeakDetectorFactory: Loaded default
>>ResourceLeakDetector: io.netty.util.ResourceLeakDetector@5b1bb5d2
>> 19/06/27 15:59:08 DEBUG TransportClientFactory: Connection to
>>master/192.168.3.13:7077 successful, running bootstraps...
>> 19/06/27 15:59:08 INFO TransportClientFactory: Successfully created
>>connection to master/192.168.3.13:7077 after 41 ms (0 ms spent in
>>bootstraps)
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxCapacityPerThread: 32768
>> 19/06/27 15:59:08 DEBUG Recycler:
>>-Dio.netty.recycler.maxSharedCapacityFactor: 2
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.linkCapacity:
>>16
>> 19/06/27 15:59:08 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Connected to
>>Spark cluster with app ID app-20190627155908-0005
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/0 on
>>worker-20190627152154-192.168.3.11-8882 (192.168.3.11:8882) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/0 on hostPort 192.168.3.11:8882 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/1 on
>>worker-20190627152150-192.168.3.12-8881 (192.168.3.12:8881) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/1 on hostPort 192.168.3.12:8881 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 DEBUG TransportServer: Shuffle server started on
>>port: 39189
>> 19/06/27 15:59:08 INFO Utils: Successfully started service
>>'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>39189.
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/2 on
>>worker-20190627152203-192.168.3.9-8884 (192.168.3.9:8884) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO NettyBlockTransferService: Server created on
>>master:39189
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/2 on hostPort 192.168.3.9:8884 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/3 on
>>worker-20190627152158-192.168.3.10-8883 (192.168.3.10:8883) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/3 on hostPort 192.168.3.10:8883 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>added: app-20190627155908-0005/4 on
>>worker-20190627152207-192.168.3.8-8885 (192.168.3.8:8885) with 2
>>core(s)
>> 19/06/27 15:59:08 INFO BlockManager: Using
>>org.apache.spark.storage.RandomBlockReplicationPolicy for block
>>replication policy
>> 19/06/27 15:59:08 INFO StandaloneSchedulerBackend: Granted executor
>>ID app-20190627155908-0005/4 on hostPort 192.168.3.8:8885 with 2
>>core(s), 1024.0 MB RAM
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/0 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/3 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/4 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/1 is now RUNNING
>> 19/06/27 15:59:08 INFO StandaloneAppClient$ClientEndpoint: Executor
>>updated: app-20190627155908-0005/2 is now RUNNING
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registering BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 DEBUG DefaultTopologyMapper: Got a request for
>>master
>> 19/06/27 15:59:08 INFO BlockManagerMasterEndpoint: Registering block
>>manager master:39189 with 366.3 MB RAM, BlockManagerId(driver,
>>master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManagerMaster: Registered BlockManager
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:08 INFO BlockManager: Initialized BlockManager:
>>BlockManagerId(driver, master, 39189, None)
>> 19/06/27 15:59:09 INFO StandaloneSchedulerBackend: SchedulerBackend
>>is ready for scheduling beginning after reached
>>minRegisteredResourcesRatio: 0.0
>> 19/06/27 15:59:09 DEBUG SparkContext: Adding shutdown hook
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.use.legacy.blockreader.local = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.read.shortcircuit = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal:
>>dfs.client.domain.socket.data.traffic = false
>> 19/06/27 15:59:09 DEBUG BlockReaderLocal: dfs.domain.socket.path =
>> 19/06/27 15:59:09 DEBUG RetryUtils: multipleLinearRandomRetry = null
>> 19/06/27 15:59:09 DEBUG Server: rpcKind=RPC_PROTOCOL_BUFFER,
>>rpcRequestWrapperClass=class
>>org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
>>rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@23f3dbf0
>> 19/06/27 15:59:09 DEBUG Client: getting client out of cache:
>>org.apache.hadoop.ipc.Client@3ed03652
>> 19/06/27 15:59:09 DEBUG PerformanceAdvisory: Both short-circuit
>>local reads and UNIX domain socket are disabled.
>> 19/06/27 15:59:09 DEBUG DataTransferSaslUtil: DataTransferProtocol
>>not using SaslPropertiesResolver, no QOP found in configuration for
>>dfs.data.transfer.protection
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0 stored as
>>values in memory (estimated size 288.9 KB, free 366.0 MB)
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0 locally
>>took  115 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block broadcast_0
>>without replication took  117 ms
>> 19/06/27 15:59:10 INFO MemoryStore: Block broadcast_0_piece0 stored
>>as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
>> 19/06/27 15:59:10 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>memory on master:39189 (size: 23.8 KB, free: 366.3 MB)
>> 19/06/27 15:59:10 DEBUG BlockManagerMaster: Updated info of block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Told master about block
>>broadcast_0_piece0
>> 19/06/27 15:59:10 DEBUG BlockManager: Put block broadcast_0_piece0
>>locally took  6 ms
>> 19/06/27 15:59:10 DEBUG BlockManager: Putting block
>>broadcast_0_piece0 without replication took  6 ms
>> 19/06/27 15:59:10 INFO SparkContext: Created broadcast 0 from
>>newAPIHadoopFile at TeraSort.scala:60
>> 19/06/27 15:59:10 DEBUG Client: The ping interval is 60000 ms.
>> 19/06/27 15:59:10 DEBUG Client: Connecting to
>>NameNode-1/192.168.3.7:54310
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: starting, having
>>connections 1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #0
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #0
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getFileInfo took
>>31ms
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #1
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #1
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: getListing took 5ms
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Time taken to get
>>FileStatuses: 134
>> 19/06/27 15:59:10 INFO FileInputFormat: Total input paths to process
>>: 2
>> 19/06/27 15:59:10 DEBUG FileInputFormat: Total # of splits generated
>>by getSplits: 2, TimeTaken: 139
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Creating committer
>>org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
>>output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
>> 19/06/27 15:59:10 DEBUG FileCommitProtocol: Using (String, String,
>>Boolean) constructor
>> 19/06/27 15:59:10 INFO FileOutputCommitter: File Output Committer
>>Algorithm version is 1
>> 19/06/27 15:59:10 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
>>masked=rwxr-xr-x
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser sending #2
>> 19/06/27 15:59:10 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser got value #2
>> 19/06/27 15:59:10 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
>> 19/06/27 15:59:10 DEBUG ClosureCleaner: Cleaning lambda:
>>$anonfun$write$1
>> 19/06/27 15:59:10 DEBUG ClosureCleaner:  +++ Lambda closure
>>($anonfun$write$1) is now cleaned +++
>> 19/06/27 15:59:10 INFO SparkContext: Starting job: runJob at
>>SparkHadoopWriter.scala:78
>> 19/06/27 15:59:10 INFO CrailDispatcher: CrailStore starting version
>>400
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteonclose
>>false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.deleteOnStart
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.preallocate 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.writeAhead 0
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.debug false
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.serializer
>>org.apache.spark.serializer.CrailSparkSerializer
>> 19/06/27 15:59:10 INFO CrailDispatcher: spark.crail.shuffle.affinity
>>true
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.outstanding 1
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.shuffle.storageclass 0
>> 19/06/27 15:59:10 INFO CrailDispatcher:
>>spark.crail.broadcast.storageclass 0
>> 19/06/27 15:59:10 INFO crail: creating singleton crail file system
>> 19/06/27 15:59:10 INFO crail: crail.version 3101
>> 19/06/27 15:59:10 INFO crail: crail.directorydepth 16
>> 19/06/27 15:59:10 INFO crail: crail.tokenexpiration 10
>> 19/06/27 15:59:10 INFO crail: crail.blocksize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.cachelimit 0
>> 19/06/27 15:59:10 INFO crail: crail.cachepath /dev/hugepages/cache
>> 19/06/27 15:59:10 INFO crail: crail.user crail
>> 19/06/27 15:59:10 INFO crail: crail.shadowreplication 1
>> 19/06/27 15:59:10 INFO crail: crail.debug true
>> 19/06/27 15:59:10 INFO crail: crail.statistics true
>> 19/06/27 15:59:10 INFO crail: crail.rpctimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.datatimeout 1000
>> 19/06/27 15:59:10 INFO crail: crail.buffersize 1048576
>> 19/06/27 15:59:10 INFO crail: crail.slicesize 65536
>> 19/06/27 15:59:10 INFO crail: crail.singleton true
>> 19/06/27 15:59:10 INFO crail: crail.regionsize 1073741824
>> 19/06/27 15:59:10 INFO crail: crail.directoryrecord 512
>> 19/06/27 15:59:10 INFO crail: crail.directoryrandomize true
>> 19/06/27 15:59:10 INFO crail: crail.cacheimpl
>>org.apache.crail.memory.MappedBufferCache
>> 19/06/27 15:59:10 INFO crail: crail.locationmap
>> 19/06/27 15:59:10 INFO crail: crail.namenode.address
>>crail://192.168.1.164:9060
>> 19/06/27 15:59:10 INFO crail: crail.namenode.blockselection
>>roundrobin
>> 19/06/27 15:59:10 INFO crail: crail.namenode.fileblocks 16
>> 19/06/27 15:59:10 INFO crail: crail.namenode.rpctype
>>org.apache.crail.namenode.rpc.tcp.TcpNameNode
>> 19/06/27 15:59:10 INFO crail: crail.namenode.log
>> 19/06/27 15:59:10 INFO crail: crail.storage.types
>>org.apache.crail.storage.rdma.RdmaStorageTier
>> 19/06/27 15:59:10 INFO crail: crail.storage.classes 1
>> 19/06/27 15:59:10 INFO crail: crail.storage.rootclass 0
>> 19/06/27 15:59:10 INFO crail: crail.storage.keepalive 2
>> 19/06/27 15:59:10 INFO crail: buffer cache, allocationCount 0,
>>bufferCount 1024
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.interface eth0
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.port 50020
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.storagelimit
>>4294967296
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.allocationsize
>>1073741824
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.datapath
>>/dev/hugepages/rdma
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.localmap true
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.queuesize 32
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.type passive
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.backlog 100
>> 19/06/27 15:59:10 INFO crail: crail.storage.rdma.connecttimeout 1000
>> 19/06/27 15:59:10 INFO narpc: new NaRPC server group v1.0,
>>queueDepth 32, messageSize 512, nodealy true
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.queueDepth 32
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.messageSize 512
>> 19/06/27 15:59:10 INFO crail: crail.namenode.tcp.cores 1
>> 19/06/27 15:59:10 INFO crail: connected to namenode(s)
>>/192.168.1.164:9060
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: lookupDirectory: path /spark
>> 19/06/27 15:59:10 INFO CrailDispatcher: creating main dir /spark
>> 19/06/27 15:59:10 INFO crail: createNode: name /spark, type
>>DIRECTORY, storageAffinity 0, locationAffinity 0
>> 19/06/27 15:59:10 INFO crail: CoreOutputStream, open, path /, fd 0,
>>streamId 1, isDir true, writeHint 0
>> 19/06/27 15:59:10 INFO crail: passive data client
>> 19/06/27 15:59:10 INFO disni: creating  RdmaProvider of type 'nat'
>> 19/06/27 15:59:10 INFO disni: jverbs jni version 32
>> 19/06/27 15:59:10 INFO disni: sock_addr_in size mismatch, jverbs
>>size 28, native size 16
>> 19/06/27 15:59:10 INFO disni: IbvRecvWR size match, jverbs size 32,
>>native size 32
>> 19/06/27 15:59:10 INFO disni: IbvSendWR size mismatch, jverbs size
>>72, native size 128
>> 19/06/27 15:59:10 INFO disni: IbvWC size match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: IbvSge size match, jverbs size 16,
>>native size 16
>> 19/06/27 15:59:10 INFO disni: Remote addr offset match, jverbs size
>>40, native size 40
>> 19/06/27 15:59:10 INFO disni: Rkey offset match, jverbs size 48,
>>native size 48
>> 19/06/27 15:59:10 INFO disni: createEventChannel, objId
>>139811924587312
>> 19/06/27 15:59:10 INFO disni: passive endpoint group, maxWR 32,
>>maxSge 4, cqSize 64
>> 19/06/27 15:59:10 INFO disni: launching cm processor, cmChannel 0
>> 19/06/27 15:59:10 INFO disni: createId, id 139811924676432
>> 19/06/27 15:59:10 INFO disni: new client endpoint, id 0, idPriv 0
>> 19/06/27 15:59:10 INFO disni: resolveAddr, addres
>>/192.168.3.100:4420
>> 19/06/27 15:59:10 INFO disni: resolveRoute, id 0
>> 19/06/27 15:59:10 INFO disni: allocPd, objId 139811924679808
>> 19/06/27 15:59:10 INFO disni: setting up protection domain, context
>>467, pd 1
>> 19/06/27 15:59:10 INFO disni: setting up cq processor
>> 19/06/27 15:59:10 INFO disni: new endpoint CQ processor
>> 19/06/27 15:59:10 INFO disni: createCompChannel, context
>>139810647883744
>> 19/06/27 15:59:10 INFO disni: createCQ, objId 139811924680688, ncqe
>>64
>> 19/06/27 15:59:10 INFO disni: createQP, objId 139811924691192,
>>send_wr size 32, recv_wr_size 32
>> 19/06/27 15:59:10 INFO disni: connect, id 0
>> 19/06/27 15:59:10 INFO disni: got event type + UNKNOWN, srcAddress
>>/192.168.3.13:43273, dstAddress /192.168.3.100:4420
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.11:35854) with ID 0
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.12:44312) with ID 1
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.8:34774) with ID 4
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.9:58808) with ID 2
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.11
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.11:41919 with 366.3 MB RAM, BlockManagerId(0,
>>192.168.3.11, 41919, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.12
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.12:46697 with 366.3 MB RAM, BlockManagerId(1,
>>192.168.3.12, 46697, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.8
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.8:37281 with 366.3 MB RAM, BlockManagerId(4,
>>192.168.3.8, 37281, None)
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.9
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.9:43857 with 366.3 MB RAM, BlockManagerId(2,
>>192.168.3.9, 43857, None)
>> 19/06/27 15:59:11 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:
>>Registered executor NettyRpcEndpointRef(spark-client://Executor)
>>(192.168.3.10:40100) with ID 3
>> 19/06/27 15:59:11 DEBUG DefaultTopologyMapper: Got a request for
>>192.168.3.10
>> 19/06/27 15:59:11 INFO BlockManagerMasterEndpoint: Registering block
>>manager 192.168.3.10:38527 with 366.3 MB RAM, BlockManagerId(3,
>>192.168.3.10, 38527, None)
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: closed
>> 19/06/27 15:59:20 DEBUG Client: IPC Client (1998371610) connection
>>to NameNode-1/192.168.3.7:54310 from hduser: stopped, remaining
>>connections 0
>>
>>
>> Regards,
>>
>>           David
>>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message