incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse McConnell <jesse.mcconn...@gmail.com>
Subject Re: TableOperations import directory
Date Tue, 18 Oct 2011 17:04:00 GMT
By rejected I mean the individual lines of goop in the rfile.

I have a half a meg rf file that looks like it has reasonable goop
inside of it, but import apparently fails silently for some reason.
So my thought is things are getting rejected when whatever is
processing that file is looking at the lines trying to put them where
they ultimately go..

I see no entries in the web UI and I see no results in the
scan...presumably its going _somewhere_

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 11:51, Eric Newton <eric.newton@gmail.com> wrote:
> I'm not sure I know what you mean by rejected.  You can compare the
> visibility marking in the Key against the authorizations of the user running
> the scan.  If I have a visibility marking of "(A|B)&C", an you don't have
> A,C or B,C or A,B,C as your authorizations, accumulo will not return
> results.
> The bulk ingest example sets the authorizations for the root user.
>
> -Eric
>
> On Tue, Oct 18, 2011 at 12:45 PM, Jesse McConnell
> <jesse.mcconnell@gmail.com> wrote:
>>
>> Largely yes, I am sorting through the 'good' rf file from the bulk
>> ingest example and am trying to sort out what might be the issue with
>> ours..
>>
>> Is there any special debug flags we can throw on that might give us
>> information on what is being rejected?
>>
>> cheers,
>> jesse
>>
>> --
>> jesse mcconnell
>> jesse.mcconnell@gmail.com
>>
>>
>>
>> On Tue, Oct 18, 2011 at 08:36, Eric Newton <eric.newton@gmail.com> wrote:
>> > If you use:
>> > ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
>> > /accumulo/tables/id/bulk_uuid/000000_000000.rf
>> > Do you see your data?  Does it have visibility markings that would
>> > filter
>> > the data out?  Are the timestamps reasonable?
>> >
>> > On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell
>> > <jesse.mcconnell@gmail.com>
>> > wrote:
>> >>
>> >> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
>> >> <jesse.mcconnell@gmail.com> wrote:
>> >> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <eric.newton@gmail.com>
>> >> > wrote:
>> >> >> It's possible that the bulkImport client code may be using the
>> >> >> hadoop
>> >> >> config
>> >> >> from the java classpath, which is how it used to work.  I'll
>> >> >> investigate it
>> >> >> tomorrow.
>> >> >
>> >> > I chased that down in the debugging, unless your thinking that the
>> >> > hadoop being used in the call to getGlobs on that file pattern my be
>> >> > hitting a different fs...that might explain it.
>> >>
>> >> no no...because the copy from the import directory is actually being
>> >> copied so I don't think its a different FS issue.
>> >>
>> >>
>> >> > cheers,
>> >> > jesse
>> >> >
>> >> >> See ACCUMULO-43.
>> >> >> -Eric
>> >> >>
>> >> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines
>> >> >> <john.w.vines@ugov.gov>
>> >> >> wrote:
>> >> >>>
>> >> >>> ----- Original Message -----
>> >> >>> | From: "Jesse McConnell" <jesse.mcconnell@gmail.com>
>> >> >>> | To: accumulo-user@incubator.apache.org
>> >> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
>> >> >>> | Subject: Re: TableOperations import directory
>> >> >>> | We are trying to run a unit test type scenario were we have
a m-r
>> >> >>> | process that generates input to the bulk import process in
a
>> >> >>> local
>> >> >>> | hadoop fs, and then copy the resulting output to a directory
on
>> >> >>> the
>> >> >>> | dfs that can then be used as input to the
>> >> >>> | TableOperations.importDirectory() call.
>> >> >>> |
>> >> >>> | Is this a problem? Because the examples seem to work when
we run
>> >> >>> them
>> >> >>> | with the -examples artifact in the lib dir but when that
file is
>> >> >>> | removed and we try and run it in the same sort of way as
the unit
>> >> >>> test
>> >> >>> | above it doesn't work.
>> >> >>> |
>> >> >>> | Is there some sort of requirement that the data being generated
>> >> >>> for
>> >> >>> | the import be going to the import directory of the bulk load
>> >> >>> process
>> >> >>> | _have_ to be on the dfs?
>> >> >>> |
>> >> >>> | In other words, is a bad assumption that I could take data
from
>> >> >>> hadoop
>> >> >>> | dfs X and copy it over to hadoop dfs Y and then import it
with
>> >> >>> the
>> >> >>> | importDirectory command?
>> >> >>> |
>> >> >>> | Does the job metadata or the job configuration play any role
in
>> >> >>> the
>> >> >>> | bulk import process?
>> >> >>> |
>> >> >>> | cheers,
>> >> >>> | jesse
>> >> >>> |
>> >> >>> | --
>> >> >>> | jesse mcconnell
>> >> >>> | jesse.mcconnell@gmail.com
>> >> >>> |
>> >> >>>
>> >> >>> Everything in your process sounds correct. There is no data
besides
>> >> >>> the
>> >> >>> file used for the bulk import process. So generating on one
hdfs
>> >> >>> and
>> >> >>> transferring it over should pose no problems.
>> >> >>>
>> >> >>> Can you elaborate on the differences in regard to the examples
>> >> >>> artifact?
>> >> >>> The examples should have no effect on any regular aspects of
the
>> >> >>> system. So
>> >> >>> if you could elaborate on how it doesn't work, that would be
a good
>> >> >>> start.
>> >> >>> Like does it error, silently fail, etc.? One good place to
look is
>> >> >>> the
>> >> >>> monitor page, as that will spit up any errors/warnings you
get from
>> >> >>> the
>> >> >>> tservers, so that way if there's an error rising up from the
rfiles
>> >> >>> your
>> >> >>> using, you should be able to see that so we can correct it.
>> >> >>>
>> >> >>> Also, glancing at your other emails, are you putting them in
the
>> >> >>> Accumulo
>> >> >>> directory in hdfs when you move them to the true hdfs instance?
I
>> >> >>> highly
>> >> >>> suggest you don't do that if you are. Let bulk import put them
in
>> >> >>> the
>> >> >>> right
>> >> >>> place in the accumulo directory. It will just keep things simpler.
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Mime
View raw message