hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From financeturd financeturd <financet...@yahoo.com>
Subject Moving files from JBoss server to HDFS
Date Sun, 13 May 2012 08:18:01 GMT

We have a large number of 
custom-generated files (not just web logs) that we need to move from our JBoss servers to
HDFS.  Our first implementation ran a cron job every 5 minutes to move our files from the
"output" directory to HDFS.

Is this recommended?  We are being told by our IT team that our JBoss servers should not
have access to HDFS for security reasons.  The files must be "sucked" to HDFS by other servers
that do not accept traffic 
from the outside.  In essence, they are asking for a layer of 
indirection.  Instead of:
{JBoss server} --> {HDFS}
it's being requested that it look like:
{Separate server} <-- {JBoss server}
and then
{Separate server} --> HDFS

While I understand in principle 
what is being said, the security of having processes on JBoss servers 
writing files to HDFS doesn't seem any worse than having JBoss servers 
access a central database, which they do.

Can anyone comment on what a 
recommended approach would be?  Should our JBoss servers push their data to HDFS or should
the data be pulled by another server and then placed 
into HDFS?

Thank you!
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message