Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3F419D46 for ; Tue, 24 Apr 2012 12:23:42 +0000 (UTC) Received: (qmail 71616 invoked by uid 500); 24 Apr 2012 12:23:42 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 71291 invoked by uid 500); 24 Apr 2012 12:23:37 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 71222 invoked by uid 99); 24 Apr 2012 12:23:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Apr 2012 12:23:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.112.75.239] (HELO iron-u-b-out.osis.gov) (206.112.75.239) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Apr 2012 12:23:29 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArQFAFCalk+sEAbx/2dsb2JhbABEoDWIKIkEgQ+CCQEBAQMBEgJqCwsLDRUZIQETBR0ZIodfAwYQm2sKlTENiVOJeXmDK4MxBJV6gRGKLIgGgTk X-IronPort-AV: E=Sophos;i="4.75,474,1330923600"; d="scan'208";a="8957819" Received: from ghost-a.center.osis.gov (HELO mail-vx0-f169.google.com) ([172.16.6.241]) by iron-u-b-in.osis.gov with ESMTP/TLS/RC4-SHA; 24 Apr 2012 08:20:47 -0400 Received: by vcbfy7 with SMTP id fy7so543290vcb.0 for ; Tue, 24 Apr 2012 05:23:05 -0700 (PDT) Received: by 10.52.27.161 with SMTP id u1mr1338353vdg.51.1335270185191; Tue, 24 Apr 2012 05:23:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.107.206 with HTTP; Tue, 24 Apr 2012 05:22:45 -0700 (PDT) In-Reply-To: <1247812233.416137.1335238175633.JavaMail.root@linzimmb04o.imo.intelink.gov> References: <2064826893.413749.1335190839406.JavaMail.root@linzimmb04o.imo.intelink.gov> <1247812233.416137.1335238175633.JavaMail.root@linzimmb04o.imo.intelink.gov> From: John Vines Date: Tue, 24 Apr 2012 08:22:45 -0400 Message-ID: Subject: Re: Question about special characters in row key To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=20cf307d011611481704be6bd174 --20cf307d011611481704be6bd174 Content-Type: text/plain; charset=ISO-8859-1 For your range, you would use the same row twice, which would create a range which will span ONLY that row id. As for the quotation mark issue, it looks like Eric just created a ticket for it. Good find! John On Mon, Apr 23, 2012 at 11:29 PM, Steven Troxell wrote: > So I've dug through how they are ingesting, and found this method: > > /** > * Return a scanner pointing at the specified row. > * > * @param row > * The row we are searching for > * @return A scanner pointing at the specified row. > * @throws AccumuloException > * @throws AccumuloSecurityException > * @throws TableNotFoundException > */ > public Scanner getRow(Text row) throws AccumuloException, > AccumuloSecurityException, TableNotFoundException { > // Create a scanner > Scanner scanner = connector > .createScanner(tableName, userAuthorizations); > > // Find the specified row. > scanner.setRange(new Range(row)); > return scanner; > } > > > It is generally called along the lines of scanner = getRow(new > Text("whatever")) and the iterated upon. Is this enough context to > confirm you may be on the right track here? To set an end key it I would > think the last line in that method should be more like > scanner.setRange(new Range(row), new Range(row)) > > Am I correct in my thinking here? > > > Regarding the shell, I tried both of your suggestions, to no success. I'm > not sure I see where you were going with the truncation, my suspicion is > it's the quote which is the first character, not the ( casing the problem. > In any case: > scan -b "Journal 1 fails for lack of a closing quote, and when i > close the quote, I again get the entire set of results. > > Scanning with \x22 leads to a usage error. > > > > On Mon, Apr 23, 2012 at 10:45 AM, John Vines wrote: > >> Sounds like your software isn't setting end keys. If you create a range >> with just a start, it will go on ad infinitum until you no longer iterate. >> This is similar to doing a scan using -b without -e. >> >> As for why you can't replicate it in your normal scan, it could either be >> the key not being what you think it is, or just a problem with the way >> shell handles non alphanumeric characters. One option would be to truncate >> your scan's start to "Journal 1 and see what you hit first. If you see >> yourself starting way beyond your "Journal 1 (1940... then we may not be >> handling quotes well in the shell or your key is not right. At this point, >> try substituting \x22 for the quotation mark and scanning again. >> >> If that still doesn't work, then you may want to dig through your middle >> projects ingest process to see how it's forming the keys for you. >> >> John >> >> On Mon, Apr 23, 2012 at 10:20 AM, Steven Troxell < >> steven.troxell@gmail.com> wrote: >> >>> Hi everyone, >>> >>> I'm attempting to use a beta project designed to integrate an RDF engine >>> with Accumulo. There seems to be a bug somewhere in the code that fails to >>> correctly query accumulo that results in failing to limit the results to >>> the following sparql query: >>> >>> SELECT ?yr >>> WHERE { >>> ?journal rdf:type bench:Journal . >>> ?journal dc:title "Journal 1 (1940)"^^xsd:string . >>> ?journal dcterms:issued ?yr >>> } >>> >>> I get results back ranging from 1940-1966, while the Hbase integration >>> with this particular software correctly just returns 1940. It's fairly >>> complicated to explain the entire process of how accumulo scans are spawned >>> from the above query, but I believe I've narrowed down a possible source of >>> error that I'd like further leads: >>> >>> >>> I suspect the developers may not be handling the quotations correctly >>> in scanning accumulo. I say this because this is a sample row from the >>> accumulo shell: >>> >>> "Journal 1 (1940)"^^http://www.w3.org/2001/XMLSchema#string o: >>> http://localhost/publications/journals/Journal1/1940 >>> http://purl.org/dc/elements/1.1/title [ROLE1] >>> >>> From the shell, I have yet to figure out how to successfully scan for >>> the row key, just a straight scan -b "Journal 1 (1940)"^^ >>> http://www.w3.org/2001/XMLSchema#string fails to usage, wrapping the >>> rowkey in single quotes seems to return all results, which is what I >>> suspect happening in the actual software I'm using, as it explains the >>> behavior I'm seeing. >>> >>> I'm guessing, but not entirely sure, the developers may have misused the >>> programatic scans as well on account of not handling the quotations >>> correctly? Is this reasonable, and can anyone provide further insight? >>> >>> Thanks, >>> Steve >>> >> >> > --20cf307d011611481704be6bd174 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
For your range, you would use the same row twice= , which would create a range which will span ONLY that row id.

As fo= r the quotation mark issue, it looks like Eric just created a ticket for it= . Good find!

John

On Mon, Apr 23, 2012 at 11:29 PM= , Steven Troxell <steven.troxell@gmail.com> wrote:
So I've dug through how they are in= gesting, and found this method:

/**
=A0=A0=A0 =A0* Return a scann= er pointing at the specified row.
=A0=A0=A0 =A0*
=A0=A0=A0 =A0* @param row
=A0=A0=A0 =A0*=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 The row we are searching for
=A0=A0=A0 =A0* @return A scanner pointing at the specified row.
=A0=A0= =A0 =A0* @throws AccumuloException
=A0=A0=A0 =A0* @throws AccumuloSecuri= tyException
=A0=A0=A0 =A0* @throws TableNotFoundException
=A0=A0=A0 = =A0*/
=A0=A0=A0 public Scanner getRow(Text row) throws AccumuloException= ,
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 AccumuloSecurityException, TableNotFoundExcep= tion {
=A0=A0=A0 =A0=A0=A0 // Create a scanner
=A0=A0=A0 =A0=A0=A0 Sc= anner scanner =3D connector
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 .cre= ateScanner(tableName, userAuthorizations);

=A0=A0=A0 =A0=A0=A0 // Fi= nd the specified row.
=A0=A0=A0 =A0=A0=A0 scanner.setRange(new Range(row));
=A0=A0=A0 =A0=A0= =A0 return scanner;
=A0=A0=A0 }

=A0=A0=A0=A0=A0=A0
It is gene= rally called along the lines of=A0 scanner =3D getRow(new Text("whatev= er"))=A0 and the iterated upon.=A0 Is this enough context to confirm y= ou may be on the right track here?=A0 To set an end key it I would think th= e last line in that method should be more like=A0 scanner.setRange(new Rang= e(row), new Range(row))

Am I correct in my thinking here?


Regarding the shell, I tried both of your suggestions, to no=20 success.=A0 I'm not sure I see where you were going with the truncation= ,=20 my suspicion is it's the quote which is the first character, not the (= =20 casing the problem.=A0 In any case:
=A0=A0=A0=A0=A0=A0=A0 scan -b "Journal 1=A0=A0 fails for lack of a clo= sing quote, and when i close the quote, I again get the entire set of results.
=A0
Scanning with \x22 leads to a usage error.



On Mon, Apr 23, 2012 at 10:45 AM, John Vines <john.w.= vines@ugov.gov> wrote:
Sounds like your software isn't setting end keys. If you create a ran= ge with just a start, it will go on ad infinitum until you no longer iterat= e. This is similar to doing a scan using -b without -e.

As for why you can't replicate it in your normal scan, it could eit= her be the key not being what you think it is, or just a problem with the w= ay shell handles non alphanumeric characters. One option would be to trunca= te your scan's start to "Journal 1 and see what you hit first. If = you see yourself starting way beyond your "Journal 1 (1940... then we = may not be handling quotes well in the shell or your key is not right. At t= his point, try substituting \x22 for the quotation mark and scanning again.=

If that still doesn't work, then you may want to dig through your m= iddle projects ingest process to see how it's forming the keys for you.=

John

On Mon, Apr 23, 2012 at= 10:20 AM, Steven Troxell <steven.troxell@gmail.com> = wrote:
Hi = everyone,

I'm attempting to use a beta projec= t designed to integrate an RDF engine with Accumulo. =A0There seems to be a= bug somewhere in the code that fails to correctly query accumulo that resu= lts in failing to limit the results to the following sparql query:

SELECT ?yr
WHERE {
=A0 ?journa= l rdf:type bench:Journal .
=A0 ?journal dc:title "Journal 1 = (1940)"^^xsd:string .
=A0 ?journal dcterms:issued ?yr=A0
}

I get results back ranging from 1940-= 1966, while the Hbase integration with this particular software correctly j= ust returns 1940. =A0It's fairly complicated to explain the entire proc= ess of how accumulo scans are spawned from the above query, but I believe I= 've narrowed down a possible source of error that I'd like further = leads:


I suspect the developers may not be hand= ling the =A0quotations correctly in scanning accumulo. =A0I say this becaus= e this is a sample row from the accumulo shell:


From the shell, I have yet to figure out how to success= fully scan for the row key, =A0just a straight scan -b=A0"Journal 1 (1= 940)"^^http://www.w3.org/2001/XMLSchema#stri= ng=A0fails to usage, =A0wrapping the rowkey in single quotes seems to r= eturn all results, which is what I suspect happening in the actual software= I'm using, as it explains the behavior I'm seeing.=A0

I'm guessing, but not entirely sure, the developers= may have misused the programatic scans as well on account of not handling = the quotations correctly? =A0Is this reasonable, and can anyone provide=A0f= urther=A0insight?

Thanks,
Steve



--20cf307d011611481704be6bd174--