db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Using derby to parse an SQL statement
Date Fri, 21 Nov 2008 14:41:57 GMT
Hi Christian,

Thanks for the additional context on the problem you are tackling. Some 
comments inline...

Christian Riedel wrote:
> Hi Rick,
> as requested I want to report our findings when working with the parser:
> Background: we want to use the parser to analyze SQL statements that 
> are generated by different tools, for example an ETL tool. It seems 
> that, depending on the DBMS the tools are configured to work with, the 
> generated SQL is not completely standard conform, which causes a 
> parser exception.
> As example the statement
>     select null "a_null_column", id "an_id_column" from a_table
> runs fine on an MSSQL DB, but causes problems on a DB2.
> Feeding the parser with this statement brings up a parsing exception. 
> Supposedly because it is not SQL standard conform.
> Our question is: Does the derby parser "only" understand standard SQL 
> or can it be instructed to accept DB2, Oracle, MSSQL etc. specific 
> syntax as well?
The Derby parser only understands Derby's dialect of SQL, which is 
supposed to be a subset of standard SQL. In the past several years, the 
community has interpreted that to mean the 2003 version of the SQL 
standard. Of course, Derby is not completely compliant, but we try hard. 
Derby also understands some bits of the DB2 dialect. Derby hasn't been 
taught any Oracle-specific or MSSQL-specific syntax that I'm aware of.

You might be able to hack up the Derby parser so that it understands 
bits of these other dialects.

Alternatively, using the current parser as any example, you might be 
able to write some other parsers which digest DB2, Oracle, and MSSQL 
syntax and turn out Derby ASTs. I think that you could hack up 
CompilerContextImpl.getParser() to return different parsers depending on 
the setting of an environment variable.

This sounds like a lot of work to me. In addition, I don't know enough 
about the peculiarities of those other dialects to be able to say 
whether Derby's set of AST nodes is rich enough to express DB2, Oracle, 
and MSSQL's languages.

So the short answer to your question is this: using Derby's parser, all 
that you can expect to parse is a fairly large subset of standard SQL.

Hope this helps,

> Other than this we have not had any problems with the parser. Using 
> the Derby AST brings the information we wanted.
> I hope you can help us with this problem as well.
> Regards
> Christian
> Rick Hillegas schrieb:
>> Hi Christian,
>> I have checked the patch into the development trunk. At this point, 
>> you will automatically get the fix if you build an up-to-date copy of 
>> the trunk. Please let us know what your experience is. You will 
>> notice from the discussion on the JIRA that some syntax may not 
>> work--this is because the separation between the parsing and 
>> name-resolution phases is not as clean as it should be. The JIRA 
>> would be a good place for you to record your observations.
>> Thanks,
>> -Rick
>> Christian Riedel wrote:
>>> Hi Rick,
>>> at first thank you very much for your efforts so far. At a first 
>>> glance your changes to the code seem to be exactly what we want. I 
>>> will try if it works asap.
>>> To test it with the derby libs I'd have to work on the current trunk 
>>> and apply the patch an then compile derby manually, right?
>>> I'll keep you updated
>>> Thanks for your help
>>> Christian
>>> Rick Hillegas schrieb:
>>>> Hi Christian,
>>>> I have created a JIRA to track this issue: 
>>>> https://issues.apache.org/jira/browse/DERBY-3946
>>>> I have attached to the JIRA a small patch which exposes the AST 
>>>> produced by the parser. I have also attached a simple program, 
>>>> ASTParser, which shows how to retrieve the AST from Derby. I am 
>>>> inclined to check this patch in to the trunk. Please let me know if 
>>>> you find this useful and if you would like me to port this patch to 
>>>> another Derby branch.
>>>> Hope this helps,
>>>> -Rick
>>>> Christian Riedel wrote:
>>>>> Hi Rick,
>>>>> first of all thanks for your answer ... now the relations have 
>>>>> become a lot clearer ...
>>>>> Your are right, there is a lot of things to be done that we 
>>>>> probably don't want to go through. You asked why we cannot take 
>>>>> the whole derby engine and use it ... well there is nor real 
>>>>> reason not to do so. The only "problem" I see is, that derby is a 
>>>>> dbms - if I am not mistaken - and we only have an SQL statement 
>>>>> that we extract from a text file and want tot analyze it to 
>>>>> extract some metadata from it.
>>>>> So if we take the derby engine as it is, how can I prevent that we 
>>>>> have to set up a "dummy" DB in order to be able to actually use 
>>>>> thje parsing feature ....
>>>>> I hope you see my point.
>>>>> We could live with setting up a dummy DB ... and I do think that 
>>>>> the derby AST offers all information we need. It's just that I 
>>>>> don't see how we can set this thing up. So having a dummy DB is 
>>>>> necessary to be able to intercept the parsing process to get hold 
>>>>> of the AST? Can we actually access the AST if we choose to set up 
>>>>> a dummy DB? I think that would be something we could live with ;-)
>>>>> Thanks for your support
>>>>> Christian
>>>>> Rick Hillegas schrieb:
>>>>>> Hi Christian,
>>>>>> I think you will have difficulty isolating the Parser from the 
>>>>>> rest of the SQL interpreter. In theory, you should be able to 
>>>>>> isolate the compiler from the execution engine and the storage 
>>>>>> layer--but that is an untested theory.
>>>>>> The Parser wants to turn out abstract syntax trees (AST). 
>>>>>> Ideally, the Parser would just need to ask a NodeFactory for AST

>>>>>> nodes and you could supply your own NodeFactory. But I think that

>>>>>> there is a fair amount of coupling between the Parser and Derby's

>>>>>> concrete implementation of NodeFactory. I think that you could 
>>>>>> uncouple the two, but you may not want to spend your time on that.
>>>>>> So the Parser is going to force you to pull in the AST nodes. 
>>>>>> Once you do that, you will end up with the whole compiler. In 
>>>>>> particular, the AST nodes (and the Parser itself) expect that you

>>>>>> will supply an implementation of LanguageConnectionContext, the 
>>>>>> master state variable for the whole SQL interpreter. Untangling 
>>>>>> that requirement is another chunk of work you may not want to do.
>>>>>> Then there is the Monitor. It has been a while since I was in 
>>>>>> that code but I seem to recall that fairly early on the Monitor 
>>>>>> wants to fault in a storage layer. In theory you ought to be able

>>>>>> to supply the Monitor a list of modules that doesn't include a 
>>>>>> storage layer. But since no-one runs in this configuration, there

>>>>>> are probably a lot of undocumented surprises that you may not 
>>>>>> want to fix either.
>>>>>> Can I ask you what breaks if you just pull in the whole Derby 
>>>>>> engine? Are you concerned that you will fault in too much code 
>>>>>> that you barely use? Are you concerned that you'll end up with a

>>>>>> dummy database that you don't need? Are Derby's AST nodes not a 
>>>>>> usable representation of statement syntax?
>>>>>> Thanks,
>>>>>> -Rick
>>>>>> Christian Riedel wrote:
>>>>>>> Hi there,
>>>>>>> we are working on a small project where we need to analyze an

>>>>>>> SQL statement that can be of any kind: very simple, with inner

>>>>>>> selects, complex join etc.
>>>>>>> We figured it inappropriate to start to write our own parser

>>>>>>> when there are other projects, like derby, out there that can
>>>>>>> it much better than we would possibly do ... so this was our
>>>>>>> Can we use derby to create an instance of Parser 
>>>>>>> (org.apache.derby.iapi.sql.compile.Parser.class) and let our
>>>>>>> statement be parsed by calling the parse() method on this 
>>>>>>> instance? What we want to have is a syntax tree of the statement

>>>>>>> that allows us to see which tables and which fields are accessed

>>>>>>> / included in the statement (including any possibly done 
>>>>>>> "renames" รก la SELECT street AS "ADDRESS" FROM USER_DATA ).
>>>>>>> The problem is, that we are stuck ... we spent several days now

>>>>>>> to try to find the proper way to create an instance of the 
>>>>>>> Parser. Is it possible at all without having to set up a running

>>>>>>> derby system?
>>>>>>> Is the Monitor class the right entry point? How can we create
>>>>>>> CompilerContext so that a Parser instance can be created?
>>>>>>> This sure is off-topic but we don't see any way through all 
>>>>>>> this. Can you help us?
>>>>>>> Thanks in advance
>>>>>>> Christian

View raw message