accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pdread <>
Subject Re: bulk ingest without mapred
Date Tue, 08 Apr 2014 16:36:00 GMT
My hdfs-site.xml has the data nodes (space?) defined as


So I created the files/directories under /data/accu1/hdfs/tmp/bulk, and so
they were.

After more exploring I found the Hadoop code that is causing the problem,
DFSClient.getFileInfo() is returning null.

 public FileStatus getFileInfo(String src) throws IOException {
    FileStatus fileStatus;

    try {
      if (fileStatusCache != null) {
        fileStatus = fileStatusCache.get(src);
        if (fileStatus != FileStatusCache.nullFileStatus) {
          return fileStatus;
      fileStatus = namenodeProtocolProxy == null ?
          : methodBasedGetFileInfo(src);
   if (fileStatusCache != null) {
   fileStatusCache.set(src, fileStatus);
   return fileStatus;
    } catch(RemoteException re) {
      throw re.unwrapRemoteException(AccessControlException.class);

So I guess now why is this the case. I noticed that no logging was done to
the hadoop logs, specifically the namenode and datanode logs. The DFSClient
code refers to rpc calls which would suggest its connection into the hadoop
system and not looking at the disk directly. Since I used FileSystem to do
the file manipulation is there additional bookkeeping that needs to be done
to let the "hadoop" system know there are files out there? In other words
even though I used hadoop to create the files does "hadoop" proper know
about them? If not then what bookkeeping has to be done to get them into the

Just a guess here. But since the files are clear there and clearly available
there must be something else at play.



View this message in context:
Sent from the Users mailing list archive at

View raw message