db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?
Date Mon, 04 Nov 2013 19:08:19 GMT

    [ https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813121#comment-13813121
] 

Rick Hillegas commented on DERBY-590:
-------------------------------------

Thanks for the second rev of the patch, Andrew. I have a couple comments under the following
topics:

o Build
o Localization
o Security
o Misc

I applied the patch, copied the Lucene jar files into tools/java, and successfully ran LuceneSupportTest.
In addition, I ran some more experiments with a script attached at the end of this comment.

Thanks,
-Rick

-------------- BUILD ----------------

This optional tool relies on the following Lucene jar files. The user documentation should
mention that these are necessary in order to use this tool:

A       tools/java/lucene-analyzers-common-4.5.0.jar
A       tools/java/lucene-queryparser-4.5.0.jar
A       tools/java/lucene-core-4.5.0.jar

I see that you intend to check the Lucene jar files into the Derby source tree so that this
code is always built and tested--even though I also see that you can build Derby cleanly without
the Lucene jar files. I agree with your decision. We have spent considerable effort making
it possible for all developers to build and test the entire Derby product, and I'd like to
stay with that model. I understand that the Lucene jar files are big. They are an order of
magnitude larger than any other jar files checked into Derby. But I don't think that including
them in our source distributions is burdensome.

I'm not clear on Lucene's compatibility guarantees:

B1) Is there some reason that you chose jar files from Lucene 4.5.0 rather than the latest
4.5.1 release?

B2) Our documentation should state which versions of Lucene we support. We should make sure
that we verify those claims during release testing.



-------------- LOCALIZATION ----------------

L1) LuceneListIndexesVTI and LuceneQueryVTI raise SQLExceptions with hard-coded error strings.
These should probably be replaced with localizable error strings. To see how to do this with
optional tools, you can look for the error handle OT_BadLoadUnloadArgs in ForeignDBViews.

L2) In addition, when LuceneQueryVTI.getRawColumn() catches an IOException, it should use
that IOException as the cause of the SQLException that's thrown afterward.

-------------- SECURITY ----------------

I think that this tool should run comfortably when a Java security manager is installed. I
recommend the following changes:

S1) LuceneSupport.dropIndex() should do its File operations inside privilege blocks.

S2)  LuceneSupportTest should not disable the security manager.


We need to give some thought to the SQL permissions needed to use this Lucene support. I have
observed the following:

S3) Only the DBO can load/unload this tool. That seems fine to me.

S4) Ordinary users don't have permission to create/drop indexes. Is that too restrictive?
It would be friendlier if anyone could create/drop indexes on any column they own. That is,
on any column in a table in a schema that they own.

S5) Ordinary users don't have permission to use the luceneQuery() table function.

S6) ... but if EXECUTE privilege on luceneQuery() is granted to PUBLIC, then people can view
any indexed data. This seems overbroad. If Fred doesn't have SELECT privilege on lucenetest.titles.title,
then Fred shouldn't be able to see the contents of that column by running luceneQuery(). A
poor man's solution to this would involve making luceneQuery() issue a dummy select against
the indexed column first, just to make sure that the user has permission to see that data.

S7) I have some misgivings about how this Lucene support will play with encrypted databases.
The conglomerates will be encrypted but a great deal of information will leak out if the Lucene
indexes are not encrypted. At a minimum, we should file a follow-on JIRA to collect thoughts/solutions
related to this vulnerability. If we don't address this vulnerability, then we should at least
describe it in our user documentation.

-------------- MISC ----------------

M1) LuceneListIndexesVTI and  LuceneSupportTest need Apache license headers.

M2) LuceneQueryVTI imports RestrictedVTI but doesn't implement it. Are you planning to make
LuceneQueryVTI a RestrictedVTI in a future rev?

M3) I like the isolation of the Lucene indexes in a separate "lucene" subdirectory of the
database. I see that createIndex() creates a subdirectory with a name made out of the schema,
table, and column arguments. There are probably some tricky edge cases involving delimited
identifiers and maybe some security vulnerabilities involved in exposing these names. The
edge cases and hypothetical security issues might be eliminated by giving these subdirectories
less friendly names like $tableID_$columnNumber.


-------------- TEST SCRIPT ----------------

connect 'jdbc:derby:db1;create=true;user=dbo';

call syscs_util.syscs_create_user( 'DBO', 'dbo_password' );
call syscs_util.syscs_create_user( 'LUCENETEST', 'lucenetest_password' );
call syscs_util.syscs_create_user( 'FRED', 'fred_password' );

-- shutdown in order to enable NATIVE authentication
connect 'jdbc:derby:db1;shutdown=true';

connect 'jdbc:derby:db1;user=lucenetest;password=lucenetest_password' as lucenetest;

create table titles (ID int generated always as identity, ISBN varchar(16), PRINTISBN varchar(16),
title varchar(1024), subtitle varchar(1024), author varchar(1024), series varchar(1024), publisher
varchar(1024), collections varchar(128), collections2 varchar(128));

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, PUBLISHER, COLLECTIONS,
COLLECTIONS2) values ('9765087650324','9765087650324','The Grapes Of Wrath','The Great Depression
in Oklahoma','John Steinbeck','Noble Winners','The Viking Press','National Book Award','Pulitzer
Prize');

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, PUBLISHER, COLLECTIONS,
COLLECTIONS2) values ('6754278542987','6754278542987','Identical: Portraits of Twins','Best
Photo Book 2012 by American Photo Magazine','Martin Schoeller','Portraits','teNeues','Photography','');

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, PUBLISHER, COLLECTIONS,
COLLECTIONS2) values ('2747583475882','2747583475882','Vines, Grapes, and Wines','The wine
drinker''s guide to grape varieties','Jancis Robinson','Reference','Alfred A. Knopf','Wine','');

insert into titles (ISBN, PRINTISBN, TITLE, SUBTITLE, AUTHOR, SERIES, PUBLISHER, COLLECTIONS,
COLLECTIONS2) values ('4356123483483','4356123483483','A Tale of Two Cities','A fictional
account of events leading up to the French revolution','Charles Dickens','Classics','Chapman
& Hall','Fiction','Social Criticism');

-- as expected, this user doesn't have permission to load the tool
call syscs_util.syscs_register_tool('luceneSupport',true);

connect 'jdbc:derby:db1;user=dbo;password=dbo_password' as dbo;

-- succeeds because dbo has permission to load optional tools
call syscs_util.syscs_register_tool('luceneSupport',true);

set connection lucenetest;

-- fails due to lack of permission. probably shouldn't fail.
call LuceneSupport.createIndex('lucenetest','titles','title','id');

set connection dbo;

-- succeeds
call LuceneSupport.createIndex('lucenetest','titles','title','id');

set connection lucenetest;

-- fails due to lack of permission. probably shouldn't fail.
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 'title', 0)
) luceneResults;


set connection dbo;

-- succeeds
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 'title', 0)
) luceneResults;

grant execute on function LuceneSupport.luceneQuery to public;

set connection lucenetest;

-- works now, after dbo lets everyone run the luceneQuery table function
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 'title', 0)
) luceneResults;

connect 'jdbc:derby:db1;user=fred;password=fred_password' as dbo;

-- it's unfortunate that fred can view data in a table he doesn't have SELECT access to
select * from table ( LuceneSupport.luceneQuery('grapes', 'lucenetest.titles', 'title', 0)
) luceneResults;

-- as expected, the following query fails due to lack of permission
select title from lucenetest.titles where 1=2;

set connection dbo;

select id, schemaname, tablename, columnname from table ( LuceneSupport.listIndexes('', '')
) listindexes;


> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: lucene_demo.diff, lucene_demo_2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message