incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse McConnell <jesse.mcconn...@gmail.com>
Subject Re: TableOperations import directory
Date Mon, 17 Oct 2011 23:34:25 GMT
On Mon, Oct 17, 2011 at 18:29, Eric Newton <eric.newton@gmail.com> wrote:
> It's possible that the bulkImport client code may be using the hadoop config
> from the java classpath, which is how it used to work.  I'll investigate it
> tomorrow.

I chased that down in the debugging, unless your thinking that the
hadoop being used in the call to getGlobs on that file pattern my be
hitting a different fs...that might explain it.

cheers,
jesse

> See ACCUMULO-43.
> -Eric
>
> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <john.w.vines@ugov.gov> wrote:
>>
>> ----- Original Message -----
>> | From: "Jesse McConnell" <jesse.mcconnell@gmail.com>
>> | To: accumulo-user@incubator.apache.org
>> | Sent: Monday, October 17, 2011 6:15:43 PM
>> | Subject: Re: TableOperations import directory
>> | We are trying to run a unit test type scenario were we have a m-r
>> | process that generates input to the bulk import process in a local
>> | hadoop fs, and then copy the resulting output to a directory on the
>> | dfs that can then be used as input to the
>> | TableOperations.importDirectory() call.
>> |
>> | Is this a problem? Because the examples seem to work when we run them
>> | with the -examples artifact in the lib dir but when that file is
>> | removed and we try and run it in the same sort of way as the unit test
>> | above it doesn't work.
>> |
>> | Is there some sort of requirement that the data being generated for
>> | the import be going to the import directory of the bulk load process
>> | _have_ to be on the dfs?
>> |
>> | In other words, is a bad assumption that I could take data from hadoop
>> | dfs X and copy it over to hadoop dfs Y and then import it with the
>> | importDirectory command?
>> |
>> | Does the job metadata or the job configuration play any role in the
>> | bulk import process?
>> |
>> | cheers,
>> | jesse
>> |
>> | --
>> | jesse mcconnell
>> | jesse.mcconnell@gmail.com
>> |
>>
>> Everything in your process sounds correct. There is no data besides the
>> file used for the bulk import process. So generating on one hdfs and
>> transferring it over should pose no problems.
>>
>> Can you elaborate on the differences in regard to the examples artifact?
>> The examples should have no effect on any regular aspects of the system. So
>> if you could elaborate on how it doesn't work, that would be a good start.
>> Like does it error, silently fail, etc.? One good place to look is the
>> monitor page, as that will spit up any errors/warnings you get from the
>> tservers, so that way if there's an error rising up from the rfiles your
>> using, you should be able to see that so we can correct it.
>>
>> Also, glancing at your other emails, are you putting them in the Accumulo
>> directory in hdfs when you move them to the true hdfs instance? I highly
>> suggest you don't do that if you are. Let bulk import put them in the right
>> place in the accumulo directory. It will just keep things simpler.
>
>

Mime
View raw message