incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: TableOperations import directory
Date Tue, 18 Oct 2011 13:36:00 GMT
If you use:

./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
/accumulo/tables/id/bulk_uuid/000000_000000.rf

Do you see your data?  Does it have visibility markings that would filter
the data out?  Are the timestamps reasonable?

On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell
<jesse.mcconnell@gmail.com>wrote:

> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
> <jesse.mcconnell@gmail.com> wrote:
> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <eric.newton@gmail.com>
> wrote:
> >> It's possible that the bulkImport client code may be using the hadoop
> config
> >> from the java classpath, which is how it used to work.  I'll investigate
> it
> >> tomorrow.
> >
> > I chased that down in the debugging, unless your thinking that the
> > hadoop being used in the call to getGlobs on that file pattern my be
> > hitting a different fs...that might explain it.
>
> no no...because the copy from the import directory is actually being
> copied so I don't think its a different FS issue.
>
>
> > cheers,
> > jesse
> >
> >> See ACCUMULO-43.
> >> -Eric
> >>
> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <john.w.vines@ugov.gov>
> wrote:
> >>>
> >>> ----- Original Message -----
> >>> | From: "Jesse McConnell" <jesse.mcconnell@gmail.com>
> >>> | To: accumulo-user@incubator.apache.org
> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
> >>> | Subject: Re: TableOperations import directory
> >>> | We are trying to run a unit test type scenario were we have a m-r
> >>> | process that generates input to the bulk import process in a local
> >>> | hadoop fs, and then copy the resulting output to a directory on the
> >>> | dfs that can then be used as input to the
> >>> | TableOperations.importDirectory() call.
> >>> |
> >>> | Is this a problem? Because the examples seem to work when we run them
> >>> | with the -examples artifact in the lib dir but when that file is
> >>> | removed and we try and run it in the same sort of way as the unit
> test
> >>> | above it doesn't work.
> >>> |
> >>> | Is there some sort of requirement that the data being generated for
> >>> | the import be going to the import directory of the bulk load process
> >>> | _have_ to be on the dfs?
> >>> |
> >>> | In other words, is a bad assumption that I could take data from
> hadoop
> >>> | dfs X and copy it over to hadoop dfs Y and then import it with the
> >>> | importDirectory command?
> >>> |
> >>> | Does the job metadata or the job configuration play any role in the
> >>> | bulk import process?
> >>> |
> >>> | cheers,
> >>> | jesse
> >>> |
> >>> | --
> >>> | jesse mcconnell
> >>> | jesse.mcconnell@gmail.com
> >>> |
> >>>
> >>> Everything in your process sounds correct. There is no data besides the
> >>> file used for the bulk import process. So generating on one hdfs and
> >>> transferring it over should pose no problems.
> >>>
> >>> Can you elaborate on the differences in regard to the examples
> artifact?
> >>> The examples should have no effect on any regular aspects of the
> system. So
> >>> if you could elaborate on how it doesn't work, that would be a good
> start.
> >>> Like does it error, silently fail, etc.? One good place to look is the
> >>> monitor page, as that will spit up any errors/warnings you get from the
> >>> tservers, so that way if there's an error rising up from the rfiles
> your
> >>> using, you should be able to see that so we can correct it.
> >>>
> >>> Also, glancing at your other emails, are you putting them in the
> Accumulo
> >>> directory in hdfs when you move them to the true hdfs instance? I
> highly
> >>> suggest you don't do that if you are. Let bulk import put them in the
> right
> >>> place in the accumulo directory. It will just keep things simpler.
> >>
> >>
> >
>

Mime
View raw message