incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Future of Blur Query Language
Date Sun, 26 Aug 2012 14:28:56 GMT
Look at IndexManagerTest.testQueryWithJoin

I have attached an easier to read version of the data.

And here is the query to "trick" it in to doing the join:

+(+test-family.testcol1:value1 nojoin) +(+test-family.testcol3:value234123)

On Sun, Aug 26, 2012 at 9:40 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> I'm trying to create a real example working of the problem, and I have
> found a bug.  Basically the query never finishes, very strange.  Not
> sure if it's Lucene or my query at this point.  Once I have an example
> I will post it and add it as an unit test in the IndexManagerTest.
>
> Aaron
>
> On Sat, Aug 25, 2012 at 7:04 PM, Garrett Barton
> <garrett.barton@gmail.com> wrote:
>> Can we get this test case working to show the problem?
>>
>>         private static void testJoin(Iface client, String table) throws
>> BlurException, TException {
>>                 RowMutation mutation = new RowMutation();
>>                 mutation.table = table;
>>                 mutation.waitToBeVisible = true;
>>                 mutation.rowId = "row1";
>>                 mutation.addToRecordMutations(newRecordMutation("cf1",
>>                                 "recordid1", newColumn("col1","value1")));
>>                 mutation.addToRecordMutations(newRecordMutation("cf1",
>>                                 "recordid2", newColumn("col2","value2")));
>>                 mutation.rowMutationType = RowMutationType.REPLACE_ROW;
>>                 client.mutate(mutation);
>>
>>                 List<String> joinTest = new ArrayList<String>();
>>                 joinTest.add("+cf1.col1:value1");
>>                 joinTest.add("+cf1.col2:value2");
>>                 joinTest.add("+cf1.col1:value1 +cf1.col2:value2");
>>                 joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue)
>> +(+cf1.col2.value2 nocf.nofield:somevalue)");
>>                 joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2)");
>>
>>                 for(String q : joinTest)
>>                         System.out.println(q + " hits: " + hits(client,table, q,
true));
>>         }
>>
>>         private static long hits(Iface client, String table, String queryStr,
>> boolean superQuery) throws BlurException, TException {
>>                 BlurQuery bq = new BlurQuery();
>>                 SimpleQuery sq = new SimpleQuery();
>>                 sq.queryStr = queryStr;
>>                 sq.superQueryOn = superQuery;
>>                 bq.simpleQuery = sq;
>>                 BlurResults query = client.query(table, bq);
>>                 return query.totalResults;
>>         }
>>
>>
>> Running I get:
>> +cf1.col1:value1 hits: 1
>> +cf1.col2:value2 hits: 1
>> +cf1.col1:value1 +cf1.col2:value2 hits: 0
>> +(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2
>> nocf.nofield:somevalue) hits: 0
>> +(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0
>>
>> Whats the trick to get the join to work?
>>
>> Honestly my first instinct in to turn the record joins into a list
>> passed in to the simple query if one wants to move into record joining
>> vs default inter record joining of the same cf.  Will ponder the other
>> options some more. :)
>>
>> ~Garrett
>>
>> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <tim.tutt@gmail.com> wrote:
>>> Aaron,
>>>
>>> Just for a little clarification on your example, when you say JOIN, are you
>>> actually just talking about a union of two sets or are you actually
>>> referring to the relational type of join where the intent is to merge them
>>> into a single record? If it's the former, wouldn't a simple OR suffice?
>>>
>>> Provided that I am in fact missing something, here are my thoughts on the
>>> query language:
>>>
>>> A common theme that I have seen across the board with commercial
>>> search/discovery products is the creation of a query language modeled after
>>> SQL with varying limitations. This tends to be fairly effective as the
>>> learning curve is not too steep for users who have experience writing SQL
>>> queries and dealing with relational databases. Additionally, these users
>>> normally find a way to live with the limitations of the language and find
>>> ways around the problems they are trying to solve as the language is
>>> typically advanced enough to be creative.
>>>
>>> Such a language, however, does not lend it self well to the less advanced
>>> end users of your product. Perhaps in certain cases this is acceptable as
>>> you will always have some advanced user available, but in the cases where
>>> these advanced users are in limited supply the learning curve becomes
>>> steeper as the technical ability and know-how decreases.
>>>
>>> In taking a brief look at the spec for CQL, I tend to agree with your
>>> assessment that it is the best option as it looks like it has the ability
>>> to be flexible enough to fit both cases. It is possible that you will run
>>> into limitations with the queries that your more advanced users are
>>> interested in, but perhaps those are the cases where Blur is not a fit.
>>>
>>>
>>> Tim
>>>
>>> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>>>
>>>> I would to start a thread on the topic of the future of Blur's query
>>>> language.  Currently the "simpleQuery" is just a normal Lucene based
>>>> syntax with a little magic to figure out the joins (via the
>>>> SuperQuery) that the user probably intended.  Of course this guess
>>>> work gets it wrong sometimes.  Let me explain with an example:
>>>>
>>>> Given the query with superOn:
>>>>
>>>> +cf1.field1:value1 +cf1.field2.value2
>>>>
>>>> The current implementation will ASSUME that you want to find where
>>>> "cf1.field1" contains "value1" and where "cf1.field2" contains
>>>> "value2" in the same Record because the column family is the same.
>>>> i.e. NO JOIN across records
>>>>
>>>> But perhaps the user really does want a join, meaning that the user
>>>> wants to find any Row that contains one or more Records that have a
>>>> field "cf1.field1" that contains "value1" and one or more Records in
>>>> the same Row (but not necessarily in the same Record) that contains a
>>>> field "cf1.field2" that contains "value2".  i.e. JOIN
>>>>
>>>> Given that current implementation, the only way to force the JOIN is
>>>> to do something like:
>>>>
>>>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2
>>>> nocf.nofield:somevalue)
>>>>
>>>> This will trick the parser into creating 2 separate join query
>>>> (SuperQuery) objects and perform the JOIN.
>>>>
>>>>
>>>> THIS IS UGLY.
>>>>
>>>> Here are the current criteria for a query language:
>>>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy,
>>>> Span, etc.)
>>>> - User defined query type should be supported, extensible
>>>> - The query language should be compatible with any programming
>>>> language so that the current thrift RPC can continue to be utilized
>>>>
>>>> Here are options that I have been thinking about:
>>>>
>>>> Option 1:
>>>> Somehow extend the current Lucene Query syntax to support these "new"
>>>> features.  The biggest issue I have with this is that we would be
>>>> creating yet another query language that users would have to learn.
>>>> Also I think that allowing users to extend the query language by
>>>> adding there own types would required a rewrite of the Lucene
>>>> implemented query parser.  So even starting with the Lucene query
>>>> language would be a lot of work.
>>>>
>>>> Option 2:
>>>> Some limited version of SQL or SQL like syntax, basically supporting
>>>> normal SQL with limited join support (probably only natural joins).
>>>> This would be nice, because most users understand SQL.  But because
>>>> Blur can not support all the various operations that SQL can provide
>>>> this will probably be frustrating to users.  And they will need to
>>>> learn what Blur SQL will provide and any special Blur only syntax.  So
>>>> this would again be like inventing another query language.
>>>>
>>>> Option 3:
>>>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be
>>>> confused with Cassandra Query Language.  Currently I like this option
>>>> the best, because it has built-in extensibility as well as the normal
>>>> options needed for a search engine.  Boolean, fuzzy, wildcard, etc.
>>>>
>>>> I really would like to get other's opinions here and any other options.
>>>>  Thanks!
>>>>
>>>> Aaron
>>>>

Mime
View raw message