accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <john.w.vi...@ugov.gov>
Subject Re: Question about special characters in row key
Date Tue, 24 Apr 2012 12:22:45 GMT
For your range, you would use the same row twice, which would create a
range which will span ONLY that row id.

As for the quotation mark issue, it looks like Eric just created a ticket
for it. Good find!

John

On Mon, Apr 23, 2012 at 11:29 PM, Steven Troxell
<steven.troxell@gmail.com>wrote:

> So I've dug through how they are ingesting, and found this method:
>
> /**
>      * Return a scanner pointing at the specified row.
>      *
>      * @param row
>      *            The row we are searching for
>      * @return A scanner pointing at the specified row.
>      * @throws AccumuloException
>      * @throws AccumuloSecurityException
>      * @throws TableNotFoundException
>      */
>     public Scanner getRow(Text row) throws AccumuloException,
>             AccumuloSecurityException, TableNotFoundException {
>         // Create a scanner
>         Scanner scanner = connector
>                 .createScanner(tableName, userAuthorizations);
>
>         // Find the specified row.
>         scanner.setRange(new Range(row));
>         return scanner;
>     }
>
>
> It is generally called along the lines of  scanner = getRow(new
> Text("whatever"))  and the iterated upon.  Is this enough context to
> confirm you may be on the right track here?  To set an end key it I would
> think the last line in that method should be more like
> scanner.setRange(new Range(row), new Range(row))
>
> Am I correct in my thinking here?
>

>
> Regarding the shell, I tried both of your suggestions, to no success.  I'm
> not sure I see where you were going with the truncation, my suspicion is
> it's the quote which is the first character, not the ( casing the problem.
> In any case:
>         scan -b "Journal 1   fails for lack of a closing quote, and when i
> close the quote, I again get the entire set of results.
>
> Scanning with \x22 leads to a usage error.
>
>
>
> On Mon, Apr 23, 2012 at 10:45 AM, John Vines <john.w.vines@ugov.gov>wrote:
>
>> Sounds like your software isn't setting end keys. If you create a range
>> with just a start, it will go on ad infinitum until you no longer iterate.
>> This is similar to doing a scan using -b without -e.
>>
>> As for why you can't replicate it in your normal scan, it could either be
>> the key not being what you think it is, or just a problem with the way
>> shell handles non alphanumeric characters. One option would be to truncate
>> your scan's start to "Journal 1 and see what you hit first. If you see
>> yourself starting way beyond your "Journal 1 (1940... then we may not be
>> handling quotes well in the shell or your key is not right. At this point,
>> try substituting \x22 for the quotation mark and scanning again.
>>
>> If that still doesn't work, then you may want to dig through your middle
>> projects ingest process to see how it's forming the keys for you.
>>
>> John
>>
>> On Mon, Apr 23, 2012 at 10:20 AM, Steven Troxell <
>> steven.troxell@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I'm attempting to use a beta project designed to integrate an RDF engine
>>> with Accumulo.  There seems to be a bug somewhere in the code that fails to
>>> correctly query accumulo that results in failing to limit the results to
>>> the following sparql query:
>>>
>>> SELECT ?yr
>>> WHERE {
>>>   ?journal rdf:type bench:Journal .
>>>   ?journal dc:title "Journal 1 (1940)"^^xsd:string .
>>>   ?journal dcterms:issued ?yr
>>> }
>>>
>>> I get results back ranging from 1940-1966, while the Hbase integration
>>> with this particular software correctly just returns 1940.  It's fairly
>>> complicated to explain the entire process of how accumulo scans are spawned
>>> from the above query, but I believe I've narrowed down a possible source of
>>> error that I'd like further leads:
>>>
>>>
>>> I suspect the developers may not be handling the  quotations correctly
>>> in scanning accumulo.  I say this because this is a sample row from the
>>> accumulo shell:
>>>
>>> "Journal 1 (1940)"^^http://www.w3.org/2001/XMLSchema#string o:
>>> http://localhost/publications/journals/Journal1/1940
>>> http://purl.org/dc/elements/1.1/title [ROLE1]
>>>
>>> From the shell, I have yet to figure out how to successfully scan for
>>> the row key,  just a straight scan -b "Journal 1 (1940)"^^
>>> http://www.w3.org/2001/XMLSchema#string fails to usage,  wrapping the
>>> rowkey in single quotes seems to return all results, which is what I
>>> suspect happening in the actual software I'm using, as it explains the
>>> behavior I'm seeing.
>>>
>>> I'm guessing, but not entirely sure, the developers may have misused the
>>> programatic scans as well on account of not handling the quotations
>>> correctly?  Is this reasonable, and can anyone provide further insight?
>>>
>>> Thanks,
>>> Steve
>>>
>>
>>
>

Mime
View raw message