incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Re: Future of Blur Query Language
Date Sun, 26 Aug 2012 18:28:54 GMT
Ahh, the lucene search time joins from the contrib
(http://lucene.apache.org/core/3_6_1/api/contrib-join/org/apache/lucene/search/join/package-summary.html)
is what I was thinking of. I like that style as a good interum upgrade
to support these record joins. How hard would it be to do something
like:

BlurQuery blurQuery = new BlurQuery();
blurQuery.simpleQuery = new SimpleQuery();
//blank since there's nothing other than the joins
blurQuery.simpleQuery.queryStr = "";
blurQuery.addJoin("+test-family.testcol1:value1");
blurQuery.addJoin(+test-family.testcol3:value234123");
blurQuery.simpleQuery.superQueryOn = true;
blurQuery.simpleQuery.type = ScoreType.SUPER;
blurQuery.fetch = 10;
blurQuery.minimumNumberOfResults = Long.MAX_VALUE;
blurQuery.maxQueryTime = Long.MAX_VALUE;
blurQuery.uuid = 1;

This way one would not have to guess at users intent.  I think this is
a cleaner workaround than the ugly query until something nicer comes
along.

If one was to go down the either of the ?ql approaches I think the
default BlurResult should be the one from blur-jdbc and actually be
what one would expect from a database returning the same query.  Not
sold on it, I always find sql very limiting.

On Sun, Aug 26, 2012 at 10:28 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> Look at IndexManagerTest.testQueryWithJoin
>
> I have attached an easier to read version of the data.
>
> And here is the query to "trick" it in to doing the join:
>
> +(+test-family.testcol1:value1 nojoin) +(+test-family.testcol3:value234123)
>
> On Sun, Aug 26, 2012 at 9:40 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>> I'm trying to create a real example working of the problem, and I have
>> found a bug.  Basically the query never finishes, very strange.  Not
>> sure if it's Lucene or my query at this point.  Once I have an example
>> I will post it and add it as an unit test in the IndexManagerTest.
>>
>> Aaron
>>
>> On Sat, Aug 25, 2012 at 7:04 PM, Garrett Barton
>> <garrett.barton@gmail.com> wrote:
>>> Can we get this test case working to show the problem?
>>>
>>>         private static void testJoin(Iface client, String table) throws
>>> BlurException, TException {
>>>                 RowMutation mutation = new RowMutation();
>>>                 mutation.table = table;
>>>                 mutation.waitToBeVisible = true;
>>>                 mutation.rowId = "row1";
>>>                 mutation.addToRecordMutations(newRecordMutation("cf1",
>>>                                 "recordid1", newColumn("col1","value1")));
>>>                 mutation.addToRecordMutations(newRecordMutation("cf1",
>>>                                 "recordid2", newColumn("col2","value2")));
>>>                 mutation.rowMutationType = RowMutationType.REPLACE_ROW;
>>>                 client.mutate(mutation);
>>>
>>>                 List<String> joinTest = new ArrayList<String>();
>>>                 joinTest.add("+cf1.col1:value1");
>>>                 joinTest.add("+cf1.col2:value2");
>>>                 joinTest.add("+cf1.col1:value1 +cf1.col2:value2");
>>>                 joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue)
>>> +(+cf1.col2.value2 nocf.nofield:somevalue)");
>>>                 joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2)");
>>>
>>>                 for(String q : joinTest)
>>>                         System.out.println(q + " hits: " + hits(client,table,
q, true));
>>>         }
>>>
>>>         private static long hits(Iface client, String table, String queryStr,
>>> boolean superQuery) throws BlurException, TException {
>>>                 BlurQuery bq = new BlurQuery();
>>>                 SimpleQuery sq = new SimpleQuery();
>>>                 sq.queryStr = queryStr;
>>>                 sq.superQueryOn = superQuery;
>>>                 bq.simpleQuery = sq;
>>>                 BlurResults query = client.query(table, bq);
>>>                 return query.totalResults;
>>>         }
>>>
>>>
>>> Running I get:
>>> +cf1.col1:value1 hits: 1
>>> +cf1.col2:value2 hits: 1
>>> +cf1.col1:value1 +cf1.col2:value2 hits: 0
>>> +(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2
>>> nocf.nofield:somevalue) hits: 0
>>> +(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0
>>>
>>> Whats the trick to get the join to work?
>>>
>>> Honestly my first instinct in to turn the record joins into a list
>>> passed in to the simple query if one wants to move into record joining
>>> vs default inter record joining of the same cf.  Will ponder the other
>>> options some more. :)
>>>
>>> ~Garrett
>>>
>>> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <tim.tutt@gmail.com> wrote:
>>>> Aaron,
>>>>
>>>> Just for a little clarification on your example, when you say JOIN, are you
>>>> actually just talking about a union of two sets or are you actually
>>>> referring to the relational type of join where the intent is to merge them
>>>> into a single record? If it's the former, wouldn't a simple OR suffice?
>>>>
>>>> Provided that I am in fact missing something, here are my thoughts on the
>>>> query language:
>>>>
>>>> A common theme that I have seen across the board with commercial
>>>> search/discovery products is the creation of a query language modeled after
>>>> SQL with varying limitations. This tends to be fairly effective as the
>>>> learning curve is not too steep for users who have experience writing SQL
>>>> queries and dealing with relational databases. Additionally, these users
>>>> normally find a way to live with the limitations of the language and find
>>>> ways around the problems they are trying to solve as the language is
>>>> typically advanced enough to be creative.
>>>>
>>>> Such a language, however, does not lend it self well to the less advanced
>>>> end users of your product. Perhaps in certain cases this is acceptable as
>>>> you will always have some advanced user available, but in the cases where
>>>> these advanced users are in limited supply the learning curve becomes
>>>> steeper as the technical ability and know-how decreases.
>>>>
>>>> In taking a brief look at the spec for CQL, I tend to agree with your
>>>> assessment that it is the best option as it looks like it has the ability
>>>> to be flexible enough to fit both cases. It is possible that you will run
>>>> into limitations with the queries that your more advanced users are
>>>> interested in, but perhaps those are the cases where Blur is not a fit.
>>>>
>>>>
>>>> Tim
>>>>
>>>> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <amccurry@gmail.com>
wrote:
>>>>
>>>>> I would to start a thread on the topic of the future of Blur's query
>>>>> language.  Currently the "simpleQuery" is just a normal Lucene based
>>>>> syntax with a little magic to figure out the joins (via the
>>>>> SuperQuery) that the user probably intended.  Of course this guess
>>>>> work gets it wrong sometimes.  Let me explain with an example:
>>>>>
>>>>> Given the query with superOn:
>>>>>
>>>>> +cf1.field1:value1 +cf1.field2.value2
>>>>>
>>>>> The current implementation will ASSUME that you want to find where
>>>>> "cf1.field1" contains "value1" and where "cf1.field2" contains
>>>>> "value2" in the same Record because the column family is the same.
>>>>> i.e. NO JOIN across records
>>>>>
>>>>> But perhaps the user really does want a join, meaning that the user
>>>>> wants to find any Row that contains one or more Records that have a
>>>>> field "cf1.field1" that contains "value1" and one or more Records in
>>>>> the same Row (but not necessarily in the same Record) that contains a
>>>>> field "cf1.field2" that contains "value2".  i.e. JOIN
>>>>>
>>>>> Given that current implementation, the only way to force the JOIN is
>>>>> to do something like:
>>>>>
>>>>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2
>>>>> nocf.nofield:somevalue)
>>>>>
>>>>> This will trick the parser into creating 2 separate join query
>>>>> (SuperQuery) objects and perform the JOIN.
>>>>>
>>>>>
>>>>> THIS IS UGLY.
>>>>>
>>>>> Here are the current criteria for a query language:
>>>>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy,
>>>>> Span, etc.)
>>>>> - User defined query type should be supported, extensible
>>>>> - The query language should be compatible with any programming
>>>>> language so that the current thrift RPC can continue to be utilized
>>>>>
>>>>> Here are options that I have been thinking about:
>>>>>
>>>>> Option 1:
>>>>> Somehow extend the current Lucene Query syntax to support these "new"
>>>>> features.  The biggest issue I have with this is that we would be
>>>>> creating yet another query language that users would have to learn.
>>>>> Also I think that allowing users to extend the query language by
>>>>> adding there own types would required a rewrite of the Lucene
>>>>> implemented query parser.  So even starting with the Lucene query
>>>>> language would be a lot of work.
>>>>>
>>>>> Option 2:
>>>>> Some limited version of SQL or SQL like syntax, basically supporting
>>>>> normal SQL with limited join support (probably only natural joins).
>>>>> This would be nice, because most users understand SQL.  But because
>>>>> Blur can not support all the various operations that SQL can provide
>>>>> this will probably be frustrating to users.  And they will need to
>>>>> learn what Blur SQL will provide and any special Blur only syntax.  So
>>>>> this would again be like inventing another query language.
>>>>>
>>>>> Option 3:
>>>>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be
>>>>> confused with Cassandra Query Language.  Currently I like this option
>>>>> the best, because it has built-in extensibility as well as the normal
>>>>> options needed for a search engine.  Boolean, fuzzy, wildcard, etc.
>>>>>
>>>>> I really would like to get other's opinions here and any other options.
>>>>>  Thanks!
>>>>>
>>>>> Aaron
>>>>>

Mime
View raw message