hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HDFS-10502) Enabled memory locking and now HDFS won't start up
Date Wed, 08 Jun 2016 16:55:21 GMT

     [ https://issues.apache.org/jira/browse/HDFS-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth resolved HDFS-10502.
----------------------------------
    Resolution: Invalid

Hello [~machey].  I recommend taking these questions to the user@hadoop.apache.org mailing
list.  We use JIRA for tracking confirmed bugs and feature requests.  We use user@hadoop.apache.org
for usage advice and troubleshooting.

Regarding whether or not this is a recommended approach, I think it depends on a few other
factors.  Is the intent to use these cached files from Hadoop workloads, such as MapReduce
jobs or Hive queries?  If not, then I wonder if your use case might be better served by something
more directly focused on general caching use cases, such as Redis or memcached.  If your use
case does involve Hadoop integration, then certainly Centralized Cache Management is worth
exploring.

Regarding the timeouts, I can tell from the exception that this is the heartbeat RPC sent
from the DataNode to the NameNode.  I recommend investigating connectivity between the DataNode
and the NameNode and examining the logs from both sides to try to determine if something is
going wrong in the handling of the heartbeat message.  On one hand, a heartbeat timeout is
not an error condition that is specific to Centralized Cache Management.  It could happen
whether or not you're using that feature.  On the other hand, the heartbeat message does contain
some optional information about the state of cache capacity and current usage at the DataNode.
 That information would trigger special handling logic at the NameNode side, so I suppose
there is a chance that something in that logic is hanging up the heartbeat handling.  Investigating
the logs might reveal more.

user@hadoop.apache.org would be a good forum for further discussion of both of these topics.

> Enabled memory locking and now HDFS won't start up
> --------------------------------------------------
>
>                 Key: HDFS-10502
>                 URL: https://issues.apache.org/jira/browse/HDFS-10502
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.2
>         Environment: RHEL 6.8
>            Reporter: Chris Machemer
>
> My goal is to speed up reads.  I have about 500k small files (2k to 15k) and I'm trying
to use HDFS as a cache for serialized instances of java objects.
> I've written the code to construct and serialize all the objects out to HDFS, and am
now hoping to improve read performance, because accessing the objects from disk-based storage
is proving to be too slow for my application's SLA's.
> So my first question is, is using memory locking and hdfs cacheadmin pools and directives
the right way to go, to cache my objects into memory, or should I create RAM disks, and do
memory-based storage instead?
> If hdfs cacheadmin is the way to go (it's the path I'm going down so far), then I need
to figure out if what's happening is a bug or if I've configured something wrong, because
when I start up HDFS with a gig of memory locked (both in limits.d for ulimit -l and also
in hdfs-site.xml) and the server starts up, and presumably tries to cache things into memory,
I get hours and hours of timeouts in the logs like this:
> 2016-06-08 07:42:50,856 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException
in offerService
> java.net.SocketTimeoutException: Call From stgb-fe1.litle.com/10.1.9.66 to localhost:8020
failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout
while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/127.0.0.1:51647 remote=localhost/127.0.0.1:8020]; For more details see:  http://wiki.apache.org/hadoop/SocketTimeout
> 	at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1479)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:51647
remote=localhost/127.0.0.1:8020]
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:133)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:133)
> 	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:520)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:387)
> 	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1084)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message