incubator-accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesse McConnell <jesse.mcconn...@gmail.com>
Subject Re: TableOperations import directory
Date Tue, 18 Oct 2011 17:20:30 GMT
So this is a sample line from your bulk ingest file:

row_00000000 foo:0 [] 1318956047982 false -> value_00000000

and this is a sample from mine:

!:6 ColumnFamilyName:C%b5;*/.:%dc;%fe;+%1e;%1a;%8c;%00;%00;%01;: []
1318957045232 false ->

and another from the same file:

H%00;~]5iSHT%a0;Z%19;%00;%00;%00;%00; DifferentColumnFamily:10:7#do []
1318957045532 false -> 244

Anything jump out as bogus with this?

cheers,
jesse


--
jesse mcconnell
jesse.mcconnell@gmail.com



On Tue, Oct 18, 2011 at 12:04, Jesse McConnell
<jesse.mcconnell@gmail.com> wrote:
> By rejected I mean the individual lines of goop in the rfile.
>
> I have a half a meg rf file that looks like it has reasonable goop
> inside of it, but import apparently fails silently for some reason.
> So my thought is things are getting rejected when whatever is
> processing that file is looking at the lines trying to put them where
> they ultimately go..
>
> I see no entries in the web UI and I see no results in the
> scan...presumably its going _somewhere_
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconnell@gmail.com
>
>
>
> On Tue, Oct 18, 2011 at 11:51, Eric Newton <eric.newton@gmail.com> wrote:
>> I'm not sure I know what you mean by rejected.  You can compare the
>> visibility marking in the Key against the authorizations of the user running
>> the scan.  If I have a visibility marking of "(A|B)&C", an you don't have
>> A,C or B,C or A,B,C as your authorizations, accumulo will not return
>> results.
>> The bulk ingest example sets the authorizations for the root user.
>>
>> -Eric
>>
>> On Tue, Oct 18, 2011 at 12:45 PM, Jesse McConnell
>> <jesse.mcconnell@gmail.com> wrote:
>>>
>>> Largely yes, I am sorting through the 'good' rf file from the bulk
>>> ingest example and am trying to sort out what might be the issue with
>>> ours..
>>>
>>> Is there any special debug flags we can throw on that might give us
>>> information on what is being rejected?
>>>
>>> cheers,
>>> jesse
>>>
>>> --
>>> jesse mcconnell
>>> jesse.mcconnell@gmail.com
>>>
>>>
>>>
>>> On Tue, Oct 18, 2011 at 08:36, Eric Newton <eric.newton@gmail.com> wrote:
>>> > If you use:
>>> > ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo -d
>>> > /accumulo/tables/id/bulk_uuid/000000_000000.rf
>>> > Do you see your data?  Does it have visibility markings that would
>>> > filter
>>> > the data out?  Are the timestamps reasonable?
>>> >
>>> > On Mon, Oct 17, 2011 at 7:35 PM, Jesse McConnell
>>> > <jesse.mcconnell@gmail.com>
>>> > wrote:
>>> >>
>>> >> On Mon, Oct 17, 2011 at 18:34, Jesse McConnell
>>> >> <jesse.mcconnell@gmail.com> wrote:
>>> >> > On Mon, Oct 17, 2011 at 18:29, Eric Newton <eric.newton@gmail.com>
>>> >> > wrote:
>>> >> >> It's possible that the bulkImport client code may be using
the
>>> >> >> hadoop
>>> >> >> config
>>> >> >> from the java classpath, which is how it used to work.  I'll
>>> >> >> investigate it
>>> >> >> tomorrow.
>>> >> >
>>> >> > I chased that down in the debugging, unless your thinking that
the
>>> >> > hadoop being used in the call to getGlobs on that file pattern
my be
>>> >> > hitting a different fs...that might explain it.
>>> >>
>>> >> no no...because the copy from the import directory is actually being
>>> >> copied so I don't think its a different FS issue.
>>> >>
>>> >>
>>> >> > cheers,
>>> >> > jesse
>>> >> >
>>> >> >> See ACCUMULO-43.
>>> >> >> -Eric
>>> >> >>
>>> >> >> On Mon, Oct 17, 2011 at 7:17 PM, John W Vines
>>> >> >> <john.w.vines@ugov.gov>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> ----- Original Message -----
>>> >> >>> | From: "Jesse McConnell" <jesse.mcconnell@gmail.com>
>>> >> >>> | To: accumulo-user@incubator.apache.org
>>> >> >>> | Sent: Monday, October 17, 2011 6:15:43 PM
>>> >> >>> | Subject: Re: TableOperations import directory
>>> >> >>> | We are trying to run a unit test type scenario were we
have a m-r
>>> >> >>> | process that generates input to the bulk import process
in a
>>> >> >>> local
>>> >> >>> | hadoop fs, and then copy the resulting output to a directory
on
>>> >> >>> the
>>> >> >>> | dfs that can then be used as input to the
>>> >> >>> | TableOperations.importDirectory() call.
>>> >> >>> |
>>> >> >>> | Is this a problem? Because the examples seem to work
when we run
>>> >> >>> them
>>> >> >>> | with the -examples artifact in the lib dir but when that
file is
>>> >> >>> | removed and we try and run it in the same sort of way
as the unit
>>> >> >>> test
>>> >> >>> | above it doesn't work.
>>> >> >>> |
>>> >> >>> | Is there some sort of requirement that the data being
generated
>>> >> >>> for
>>> >> >>> | the import be going to the import directory of the bulk
load
>>> >> >>> process
>>> >> >>> | _have_ to be on the dfs?
>>> >> >>> |
>>> >> >>> | In other words, is a bad assumption that I could take
data from
>>> >> >>> hadoop
>>> >> >>> | dfs X and copy it over to hadoop dfs Y and then import
it with
>>> >> >>> the
>>> >> >>> | importDirectory command?
>>> >> >>> |
>>> >> >>> | Does the job metadata or the job configuration play any
role in
>>> >> >>> the
>>> >> >>> | bulk import process?
>>> >> >>> |
>>> >> >>> | cheers,
>>> >> >>> | jesse
>>> >> >>> |
>>> >> >>> | --
>>> >> >>> | jesse mcconnell
>>> >> >>> | jesse.mcconnell@gmail.com
>>> >> >>> |
>>> >> >>>
>>> >> >>> Everything in your process sounds correct. There is no
data besides
>>> >> >>> the
>>> >> >>> file used for the bulk import process. So generating on
one hdfs
>>> >> >>> and
>>> >> >>> transferring it over should pose no problems.
>>> >> >>>
>>> >> >>> Can you elaborate on the differences in regard to the examples
>>> >> >>> artifact?
>>> >> >>> The examples should have no effect on any regular aspects
of the
>>> >> >>> system. So
>>> >> >>> if you could elaborate on how it doesn't work, that would
be a good
>>> >> >>> start.
>>> >> >>> Like does it error, silently fail, etc.? One good place
to look is
>>> >> >>> the
>>> >> >>> monitor page, as that will spit up any errors/warnings
you get from
>>> >> >>> the
>>> >> >>> tservers, so that way if there's an error rising up from
the rfiles
>>> >> >>> your
>>> >> >>> using, you should be able to see that so we can correct
it.
>>> >> >>>
>>> >> >>> Also, glancing at your other emails, are you putting them
in the
>>> >> >>> Accumulo
>>> >> >>> directory in hdfs when you move them to the true hdfs instance?
I
>>> >> >>> highly
>>> >> >>> suggest you don't do that if you are. Let bulk import put
them in
>>> >> >>> the
>>> >> >>> right
>>> >> >>> place in the accumulo directory. It will just keep things
simpler.
>>> >> >>
>>> >> >>
>>> >> >
>>> >
>>> >
>>
>>
>

Mime
View raw message