incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Re: Future of Blur Query Language
Date Sat, 25 Aug 2012 23:04:25 GMT
Can we get this test case working to show the problem?

	private static void testJoin(Iface client, String table) throws
BlurException, TException {
		RowMutation mutation = new RowMutation();
		mutation.table = table;
		mutation.waitToBeVisible = true;
		mutation.rowId = "row1";
		mutation.addToRecordMutations(newRecordMutation("cf1",
				"recordid1", newColumn("col1","value1")));
		mutation.addToRecordMutations(newRecordMutation("cf1",
				"recordid2", newColumn("col2","value2")));
		mutation.rowMutationType = RowMutationType.REPLACE_ROW;
		client.mutate(mutation);
		
		List<String> joinTest = new ArrayList<String>();
		joinTest.add("+cf1.col1:value1");
		joinTest.add("+cf1.col2:value2");
		joinTest.add("+cf1.col1:value1 +cf1.col2:value2");
		joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue)
+(+cf1.col2.value2 nocf.nofield:somevalue)");
		joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2)");
		
		for(String q : joinTest)
			System.out.println(q + " hits: " + hits(client,table, q, true));
	}
	
	private static long hits(Iface client, String table, String queryStr,
boolean superQuery) throws BlurException, TException {
		BlurQuery bq = new BlurQuery();
		SimpleQuery sq = new SimpleQuery();
		sq.queryStr = queryStr;
		sq.superQueryOn = superQuery;
		bq.simpleQuery = sq;
		BlurResults query = client.query(table, bq);
		return query.totalResults;
	}


Running I get:
+cf1.col1:value1 hits: 1
+cf1.col2:value2 hits: 1
+cf1.col1:value1 +cf1.col2:value2 hits: 0
+(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2
nocf.nofield:somevalue) hits: 0
+(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0

Whats the trick to get the join to work?

Honestly my first instinct in to turn the record joins into a list
passed in to the simple query if one wants to move into record joining
vs default inter record joining of the same cf.  Will ponder the other
options some more. :)

~Garrett

On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <tim.tutt@gmail.com> wrote:
> Aaron,
>
> Just for a little clarification on your example, when you say JOIN, are you
> actually just talking about a union of two sets or are you actually
> referring to the relational type of join where the intent is to merge them
> into a single record? If it's the former, wouldn't a simple OR suffice?
>
> Provided that I am in fact missing something, here are my thoughts on the
> query language:
>
> A common theme that I have seen across the board with commercial
> search/discovery products is the creation of a query language modeled after
> SQL with varying limitations. This tends to be fairly effective as the
> learning curve is not too steep for users who have experience writing SQL
> queries and dealing with relational databases. Additionally, these users
> normally find a way to live with the limitations of the language and find
> ways around the problems they are trying to solve as the language is
> typically advanced enough to be creative.
>
> Such a language, however, does not lend it self well to the less advanced
> end users of your product. Perhaps in certain cases this is acceptable as
> you will always have some advanced user available, but in the cases where
> these advanced users are in limited supply the learning curve becomes
> steeper as the technical ability and know-how decreases.
>
> In taking a brief look at the spec for CQL, I tend to agree with your
> assessment that it is the best option as it looks like it has the ability
> to be flexible enough to fit both cases. It is possible that you will run
> into limitations with the queries that your more advanced users are
> interested in, but perhaps those are the cases where Blur is not a fit.
>
>
> Tim
>
> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
>> I would to start a thread on the topic of the future of Blur's query
>> language.  Currently the "simpleQuery" is just a normal Lucene based
>> syntax with a little magic to figure out the joins (via the
>> SuperQuery) that the user probably intended.  Of course this guess
>> work gets it wrong sometimes.  Let me explain with an example:
>>
>> Given the query with superOn:
>>
>> +cf1.field1:value1 +cf1.field2.value2
>>
>> The current implementation will ASSUME that you want to find where
>> "cf1.field1" contains "value1" and where "cf1.field2" contains
>> "value2" in the same Record because the column family is the same.
>> i.e. NO JOIN across records
>>
>> But perhaps the user really does want a join, meaning that the user
>> wants to find any Row that contains one or more Records that have a
>> field "cf1.field1" that contains "value1" and one or more Records in
>> the same Row (but not necessarily in the same Record) that contains a
>> field "cf1.field2" that contains "value2".  i.e. JOIN
>>
>> Given that current implementation, the only way to force the JOIN is
>> to do something like:
>>
>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2
>> nocf.nofield:somevalue)
>>
>> This will trick the parser into creating 2 separate join query
>> (SuperQuery) objects and perform the JOIN.
>>
>>
>> THIS IS UGLY.
>>
>> Here are the current criteria for a query language:
>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy,
>> Span, etc.)
>> - User defined query type should be supported, extensible
>> - The query language should be compatible with any programming
>> language so that the current thrift RPC can continue to be utilized
>>
>> Here are options that I have been thinking about:
>>
>> Option 1:
>> Somehow extend the current Lucene Query syntax to support these "new"
>> features.  The biggest issue I have with this is that we would be
>> creating yet another query language that users would have to learn.
>> Also I think that allowing users to extend the query language by
>> adding there own types would required a rewrite of the Lucene
>> implemented query parser.  So even starting with the Lucene query
>> language would be a lot of work.
>>
>> Option 2:
>> Some limited version of SQL or SQL like syntax, basically supporting
>> normal SQL with limited join support (probably only natural joins).
>> This would be nice, because most users understand SQL.  But because
>> Blur can not support all the various operations that SQL can provide
>> this will probably be frustrating to users.  And they will need to
>> learn what Blur SQL will provide and any special Blur only syntax.  So
>> this would again be like inventing another query language.
>>
>> Option 3:
>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be
>> confused with Cassandra Query Language.  Currently I like this option
>> the best, because it has built-in extensibility as well as the normal
>> options needed for a search engine.  Boolean, fuzzy, wildcard, etc.
>>
>> I really would like to get other's opinions here and any other options.
>>  Thanks!
>>
>> Aaron
>>

Mime
View raw message