accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Troxell <steven.trox...@gmail.com>
Subject Re: Question about special characters in row key
Date Tue, 24 Apr 2012 03:28:55 GMT
So I've dug through how they are ingesting, and found this method:

/**
     * Return a scanner pointing at the specified row.
     *
     * @param row
     *            The row we are searching for
     * @return A scanner pointing at the specified row.
     * @throws AccumuloException
     * @throws AccumuloSecurityException
     * @throws TableNotFoundException
     */
    public Scanner getRow(Text row) throws AccumuloException,
            AccumuloSecurityException, TableNotFoundException {
        // Create a scanner
        Scanner scanner = connector
                .createScanner(tableName, userAuthorizations);

        // Find the specified row.
        scanner.setRange(new Range(row));
        return scanner;
    }


It is generally called along the lines of  scanner = getRow(new
Text("whatever"))  and the iterated upon.  Is this enough context to
confirm you may be on the right track here?  To set an end key it I would
think the last line in that method should be more like
scanner.setRange(new Range(row), new Range(row))

Am I correct in my thinking here?


Regarding the shell, I tried both of your suggestions, to no success.  I'm
not sure I see where you were going with the truncation, my suspicion is
it's the quote which is the first character, not the ( casing the problem.
In any case:
        scan -b "Journal 1   fails for lack of a closing quote, and when i
close the quote, I again get the entire set of results.

Scanning with \x22 leads to a usage error.


On Mon, Apr 23, 2012 at 10:45 AM, John Vines <john.w.vines@ugov.gov> wrote:

> Sounds like your software isn't setting end keys. If you create a range
> with just a start, it will go on ad infinitum until you no longer iterate.
> This is similar to doing a scan using -b without -e.
>
> As for why you can't replicate it in your normal scan, it could either be
> the key not being what you think it is, or just a problem with the way
> shell handles non alphanumeric characters. One option would be to truncate
> your scan's start to "Journal 1 and see what you hit first. If you see
> yourself starting way beyond your "Journal 1 (1940... then we may not be
> handling quotes well in the shell or your key is not right. At this point,
> try substituting \x22 for the quotation mark and scanning again.
>
> If that still doesn't work, then you may want to dig through your middle
> projects ingest process to see how it's forming the keys for you.
>
> John
>
> On Mon, Apr 23, 2012 at 10:20 AM, Steven Troxell <steven.troxell@gmail.com
> > wrote:
>
>> Hi everyone,
>>
>> I'm attempting to use a beta project designed to integrate an RDF engine
>> with Accumulo.  There seems to be a bug somewhere in the code that fails to
>> correctly query accumulo that results in failing to limit the results to
>> the following sparql query:
>>
>> SELECT ?yr
>> WHERE {
>>   ?journal rdf:type bench:Journal .
>>   ?journal dc:title "Journal 1 (1940)"^^xsd:string .
>>   ?journal dcterms:issued ?yr
>> }
>>
>> I get results back ranging from 1940-1966, while the Hbase integration
>> with this particular software correctly just returns 1940.  It's fairly
>> complicated to explain the entire process of how accumulo scans are spawned
>> from the above query, but I believe I've narrowed down a possible source of
>> error that I'd like further leads:
>>
>>
>> I suspect the developers may not be handling the  quotations correctly in
>> scanning accumulo.  I say this because this is a sample row from the
>> accumulo shell:
>>
>> "Journal 1 (1940)"^^http://www.w3.org/2001/XMLSchema#string o:
>> http://localhost/publications/journals/Journal1/1940
>> http://purl.org/dc/elements/1.1/title [ROLE1]
>>
>> From the shell, I have yet to figure out how to successfully scan for the
>> row key,  just a straight scan -b "Journal 1 (1940)"^^
>> http://www.w3.org/2001/XMLSchema#string fails to usage,  wrapping the
>> rowkey in single quotes seems to return all results, which is what I
>> suspect happening in the actual software I'm using, as it explains the
>> behavior I'm seeing.
>>
>> I'm guessing, but not entirely sure, the developers may have misused the
>> programatic scans as well on account of not handling the quotations
>> correctly?  Is this reasonable, and can anyone provide further insight?
>>
>> Thanks,
>> Steve
>>
>
>

Mime
View raw message