incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse McConnell <jesse.mcconn...@gmail.com>
Subject Re: TableOperations import directory
Date Tue, 18 Oct 2011 16:45:31 GMT
Largely yes, I am sorting through the 'good' rf file from the bulk
ingest example and am trying to sort out what might be the issue with
ours..

Is there any special debug flags we can throw on that might give us
information on what is being rejected?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 08:36, Eric Newton <eric.newton@gmail.com> wrote:
> If you use:
> ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
> /accumulo/tables/id/bulk_uuid/000000_000000.rf
> Do you see your data?  Does it have visibility markings that would filter
> the data out?  Are the timestamps reasonable?
>
> On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell <jesse.mcconnell@gmail.com>
> wrote:
>>
>> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
>> <jesse.mcconnell@gmail.com> wrote:
>> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <eric.newton@gmail.com>
>> > wrote:
>> >> It's possible that the bulkImport client code may be using the hadoop
>> >> config
>> >> from the java classpath, which is how it used to work.  I'll
>> >> investigate it
>> >> tomorrow.
>> >
>> > I chased that down in the debugging, unless your thinking that the
>> > hadoop being used in the call to getGlobs on that file pattern my be
>> > hitting a different fs...that might explain it.
>>
>> no no...because the copy from the import directory is actually being
>> copied so I don't think its a different FS issue.
>>
>>
>> > cheers,
>> > jesse
>> >
>> >> See ACCUMULO-43.
>> >> -Eric
>> >>
>> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <john.w.vines@ugov.gov>
>> >> wrote:
>> >>>
>> >>> ----- Original Message -----
>> >>> | From: "Jesse McConnell" <jesse.mcconnell@gmail.com>
>> >>> | To: accumulo-user@incubator.apache.org
>> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
>> >>> | Subject: Re: TableOperations import directory
>> >>> | We are trying to run a unit test type scenario were we have a m-r
>> >>> | process that generates input to the bulk import process in a local
>> >>> | hadoop fs, and then copy the resulting output to a directory on the
>> >>> | dfs that can then be used as input to the
>> >>> | TableOperations.importDirectory() call.
>> >>> |
>> >>> | Is this a problem? Because the examples seem to work when we run
>> >>> them
>> >>> | with the -examples artifact in the lib dir but when that file is
>> >>> | removed and we try and run it in the same sort of way as the unit
>> >>> test
>> >>> | above it doesn't work.
>> >>> |
>> >>> | Is there some sort of requirement that the data being generated for
>> >>> | the import be going to the import directory of the bulk load process
>> >>> | _have_ to be on the dfs?
>> >>> |
>> >>> | In other words, is a bad assumption that I could take data from
>> >>> hadoop
>> >>> | dfs X and copy it over to hadoop dfs Y and then import it with the
>> >>> | importDirectory command?
>> >>> |
>> >>> | Does the job metadata or the job configuration play any role in the
>> >>> | bulk import process?
>> >>> |
>> >>> | cheers,
>> >>> | jesse
>> >>> |
>> >>> | --
>> >>> | jesse mcconnell
>> >>> | jesse.mcconnell@gmail.com
>> >>> |
>> >>>
>> >>> Everything in your process sounds correct. There is no data besides
>> >>> the
>> >>> file used for the bulk import process. So generating on one hdfs and
>> >>> transferring it over should pose no problems.
>> >>>
>> >>> Can you elaborate on the differences in regard to the examples
>> >>> artifact?
>> >>> The examples should have no effect on any regular aspects of the
>> >>> system. So
>> >>> if you could elaborate on how it doesn't work, that would be a good
>> >>> start.
>> >>> Like does it error, silently fail, etc.? One good place to look is the
>> >>> monitor page, as that will spit up any errors/warnings you get from
>> >>> the
>> >>> tservers, so that way if there's an error rising up from the rfiles
>> >>> your
>> >>> using, you should be able to see that so we can correct it.
>> >>>
>> >>> Also, glancing at your other emails, are you putting them in the
>> >>> Accumulo
>> >>> directory in hdfs when you move them to the true hdfs instance? I
>> >>> highly
>> >>> suggest you don't do that if you are. Let bulk import put them in the
>> >>> right
>> >>> place in the accumulo directory. It will just keep things simpler.
>> >>
>> >>
>> >
>
>

Mime
View raw message