accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Troxell <steven.trox...@gmail.com>
Subject Re: Question about special characters in row key
Date Wed, 25 Apr 2012 03:40:50 GMT
John,

i got a chance to check this tonight,  while my syntax was slightly off,
( scanner.setRange(new Range(row), new Range(row))  should have been
scanner.setRange(new Range(row,row)),  the change significantly narrowed
down the result set.  Instead of getting excessive results from 1940-1966,
 I narrowed it down to excessive results just from 1940-1955,  I suspect
this is because there are additional cases sprinkled throughout the code
where they are setting ranges incorrectly, that I'm in process of tracking
down.  Thanks for your help on this!

-Steve

On Tue, Apr 24, 2012 at 10:09 AM, Steven Troxell
<steven.troxell@gmail.com>wrote:

> Eric,
>
> Thanks for investigating this,  and I do believe, after reviewing your
> ticket, that this was influencing my problem with the shell.
>
> John,
> Sounds reasonable, though I'm a bit curious if it would make sense to have
> a separate method that wraps/overloads the setRange for cases where the
> start/end key is the same, to abstract this "duplicating" the key as a
> parameter to the user.
>
> In the meantime,    I'll report back if this fixes it once I get a chance
> to test it, but it may be a bit, as I suspect I'm going to have to
> investigate their build process as their accumulo support is weak so I'm
> not sure they'd be willing to make the change.
>
>
> On Tue, Apr 24, 2012 at 8:47 AM, Eric Newton <eric.newton@gmail.com>wrote:
>
>> I don't know if this is your problem, but it is *a* problem I found
>> trying to demonstrate a scan using the shell and special characters:
>>
>> https://issues.apache.org/jira/browse/ACCUMULO-557
>>
>> -Eric
>>
>> On Mon, Apr 23, 2012 at 11:28 PM, Steven Troxell <
>> steven.troxell@gmail.com> wrote:
>>
>>> So I've dug through how they are ingesting, and found this method:
>>>
>>> /**
>>>      * Return a scanner pointing at the specified row.
>>>      *
>>>      * @param row
>>>      *            The row we are searching for
>>>      * @return A scanner pointing at the specified row.
>>>      * @throws AccumuloException
>>>      * @throws AccumuloSecurityException
>>>      * @throws TableNotFoundException
>>>      */
>>>     public Scanner getRow(Text row) throws AccumuloException,
>>>             AccumuloSecurityException, TableNotFoundException {
>>>         // Create a scanner
>>>         Scanner scanner = connector
>>>                 .createScanner(tableName, userAuthorizations);
>>>
>>>         // Find the specified row.
>>>         scanner.setRange(new Range(row));
>>>         return scanner;
>>>     }
>>>
>>>
>>> It is generally called along the lines of  scanner = getRow(new
>>> Text("whatever"))  and the iterated upon.  Is this enough context to
>>> confirm you may be on the right track here?  To set an end key it I would
>>> think the last line in that method should be more like
>>> scanner.setRange(new Range(row), new Range(row))
>>>
>>> Am I correct in my thinking here?
>>>
>>>
>>> Regarding the shell, I tried both of your suggestions, to no success.
>>> I'm not sure I see where you were going with the truncation, my suspicion
>>> is it's the quote which is the first character, not the ( casing the
>>> problem.  In any case:
>>>         scan -b "Journal 1   fails for lack of a closing quote, and when
>>> i close the quote, I again get the entire set of results.
>>>
>>> Scanning with \x22 leads to a usage error.
>>>
>>>
>>>
>>> On Mon, Apr 23, 2012 at 10:45 AM, John Vines <john.w.vines@ugov.gov>wrote:
>>>
>>>> Sounds like your software isn't setting end keys. If you create a range
>>>> with just a start, it will go on ad infinitum until you no longer iterate.
>>>> This is similar to doing a scan using -b without -e.
>>>>
>>>> As for why you can't replicate it in your normal scan, it could either
>>>> be the key not being what you think it is, or just a problem with the way
>>>> shell handles non alphanumeric characters. One option would be to truncate
>>>> your scan's start to "Journal 1 and see what you hit first. If you see
>>>> yourself starting way beyond your "Journal 1 (1940... then we may not be
>>>> handling quotes well in the shell or your key is not right. At this point,
>>>> try substituting \x22 for the quotation mark and scanning again.
>>>>
>>>> If that still doesn't work, then you may want to dig through your
>>>> middle projects ingest process to see how it's forming the keys for you.
>>>>
>>>> John
>>>>
>>>> On Mon, Apr 23, 2012 at 10:20 AM, Steven Troxell <
>>>> steven.troxell@gmail.com> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'm attempting to use a beta project designed to integrate an RDF
>>>>> engine with Accumulo.  There seems to be a bug somewhere in the code
that
>>>>> fails to correctly query accumulo that results in failing to limit the
>>>>> results to the following sparql query:
>>>>>
>>>>> SELECT ?yr
>>>>> WHERE {
>>>>>   ?journal rdf:type bench:Journal .
>>>>>   ?journal dc:title "Journal 1 (1940)"^^xsd:string .
>>>>>   ?journal dcterms:issued ?yr
>>>>> }
>>>>>
>>>>> I get results back ranging from 1940-1966, while the Hbase integration
>>>>> with this particular software correctly just returns 1940.  It's fairly
>>>>> complicated to explain the entire process of how accumulo scans are spawned
>>>>> from the above query, but I believe I've narrowed down a possible source
of
>>>>> error that I'd like further leads:
>>>>>
>>>>>
>>>>> I suspect the developers may not be handling the  quotations correctly
>>>>> in scanning accumulo.  I say this because this is a sample row from the
>>>>> accumulo shell:
>>>>>
>>>>> "Journal 1 (1940)"^^http://www.w3.org/2001/XMLSchema#string o:
>>>>> http://localhost/publications/journals/Journal1/1940
>>>>> http://purl.org/dc/elements/1.1/title [ROLE1]
>>>>>
>>>>> From the shell, I have yet to figure out how to successfully scan for
>>>>> the row key,  just a straight scan -b "Journal 1 (1940)"^^
>>>>> http://www.w3.org/2001/XMLSchema#string fails to usage,  wrapping the
>>>>> rowkey in single quotes seems to return all results, which is what I
>>>>> suspect happening in the actual software I'm using, as it explains the
>>>>> behavior I'm seeing.
>>>>>
>>>>> I'm guessing, but not entirely sure, the developers may have misused
>>>>> the programatic scans as well on account of not handling the quotations
>>>>> correctly?  Is this reasonable, and can anyone provide further insight?
>>>>>
>>>>> Thanks,
>>>>> Steve
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message