incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: TableOperations import directory
Date Tue, 18 Oct 2011 16:51:17 GMT
I'm not sure I know what you mean by rejected.  You can compare the
visibility marking in the Key against the authorizations of the user running
the scan.  If I have a visibility marking of "(A|B)&C", an you don't have
A,C or B,C or A,B,C as your authorizations, accumulo will not return
results.

The bulk ingest example sets the authorizations for the root user.

-Eric

On Tue, Oct 18, 2011 at 12:45 PM, Jesse McConnell <jesse.mcconnell@gmail.com
> wrote:

> Largely yes, I am sorting through the 'good' rf file from the bulk
> ingest example and am trying to sort out what might be the issue with
> ours..
>
> Is there any special debug flags we can throw on that might give us
> information on what is being rejected?
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Tue, Oct 18, 2011 at 08:36, Eric Newton <eric.newton@gmail.com> wrote:
> > If you use:
> > ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
> > /accumulo/tables/id/bulk_uuid/000000_000000.rf
> > Do you see your data?  Does it have visibility markings that would filter
> > the data out?  Are the timestamps reasonable?
> >
> > On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell <
> jesse.mcconnell@gmail.com>
> > wrote:
> >>
> >> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
> >> <jesse.mcconnell@gmail.com> wrote:
> >> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <eric.newton@gmail.com>
> >> > wrote:
> >> >> It's possible that the bulkImport client code may be using the hadoop
> >> >> config
> >> >> from the java classpath, which is how it used to work.  I'll
> >> >> investigate it
> >> >> tomorrow.
> >> >
> >> > I chased that down in the debugging, unless your thinking that the
> >> > hadoop being used in the call to getGlobs on that file pattern my be
> >> > hitting a different fs...that might explain it.
> >>
> >> no no...because the copy from the import directory is actually being
> >> copied so I don't think its a different FS issue.
> >>
> >>
> >> > cheers,
> >> > jesse
> >> >
> >> >> See ACCUMULO-43.
> >> >> -Eric
> >> >>
> >> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines <john.w.vines@ugov.gov
> >
> >> >> wrote:
> >> >>>
> >> >>> ----- Original Message -----
> >> >>> | From: "Jesse McConnell" <jesse.mcconnell@gmail.com>
> >> >>> | To: accumulo-user@incubator.apache.org
> >> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
> >> >>> | Subject: Re: TableOperations import directory
> >> >>> | We are trying to run a unit test type scenario were we have a
m-r
> >> >>> | process that generates input to the bulk import process in a
local
> >> >>> | hadoop fs, and then copy the resulting output to a directory
on
> the
> >> >>> | dfs that can then be used as input to the
> >> >>> | TableOperations.importDirectory() call.
> >> >>> |
> >> >>> | Is this a problem? Because the examples seem to work when we
run
> >> >>> them
> >> >>> | with the -examples artifact in the lib dir but when that file
is
> >> >>> | removed and we try and run it in the same sort of way as the
unit
> >> >>> test
> >> >>> | above it doesn't work.
> >> >>> |
> >> >>> | Is there some sort of requirement that the data being generated
> for
> >> >>> | the import be going to the import directory of the bulk load
> process
> >> >>> | _have_ to be on the dfs?
> >> >>> |
> >> >>> | In other words, is a bad assumption that I could take data from
> >> >>> hadoop
> >> >>> | dfs X and copy it over to hadoop dfs Y and then import it with
the
> >> >>> | importDirectory command?
> >> >>> |
> >> >>> | Does the job metadata or the job configuration play any role
in
> the
> >> >>> | bulk import process?
> >> >>> |
> >> >>> | cheers,
> >> >>> | jesse
> >> >>> |
> >> >>> | --
> >> >>> | jesse mcconnell
> >> >>> | jesse.mcconnell@gmail.com
> >> >>> |
> >> >>>
> >> >>> Everything in your process sounds correct. There is no data besides
> >> >>> the
> >> >>> file used for the bulk import process. So generating on one hdfs
and
> >> >>> transferring it over should pose no problems.
> >> >>>
> >> >>> Can you elaborate on the differences in regard to the examples
> >> >>> artifact?
> >> >>> The examples should have no effect on any regular aspects of the
> >> >>> system. So
> >> >>> if you could elaborate on how it doesn't work, that would be a
good
> >> >>> start.
> >> >>> Like does it error, silently fail, etc.? One good place to look
is
> the
> >> >>> monitor page, as that will spit up any errors/warnings you get
from
> >> >>> the
> >> >>> tservers, so that way if there's an error rising up from the rfiles
> >> >>> your
> >> >>> using, you should be able to see that so we can correct it.
> >> >>>
> >> >>> Also, glancing at your other emails, are you putting them in the
> >> >>> Accumulo
> >> >>> directory in hdfs when you move them to the true hdfs instance?
I
> >> >>> highly
> >> >>> suggest you don't do that if you are. Let bulk import put them
in
> the
> >> >>> right
> >> >>> place in the accumulo directory. It will just keep things simpler.
> >> >>
> >> >>
> >> >
> >
> >
>

Mime
View raw message