cassandra-client-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Calling all library maintainers
Date Fri, 05 Nov 2010 15:05:46 GMT
On Fri, Nov 5, 2010 at 9:44 AM, Eric Evans <eevans@rackspace.com> wrote:
> On Fri, 2010-11-05 at 02:43 -0500, Stu Hood wrote:
>> > Java you serialize a type to a byte[] whereas with the query
>> > language you'd serialize to a string term
>> >
>> The "serializing to a byte[]" part is what the RPC libraries exist for.
>> With a string serialization format, you are setting all of your clients
>> up to become string concatenation engines with an ad-hoc format defined
>> by your spec: essentially, duplicating Avro and Thrift.
>
> I was referring to keys and column names and values which are typed as
> binary in both Avro and Thrift.
>
>> > TIMEUUID(<timestamp>)
>> Note that this same approach is possible in Avro by adding a union type:
>> it is not dependent on String serialization.
>
> How can a TimeUUIDType be expressed in Avro using a union?
>
>> > to serialize that to a string like
>> > 100000L, than it would be to pack a binary string in network-order
>> I don't think you are giving client library devs enough credit: this only needs
>> to be implemented once, and I'm sure they're capable.
>
> I was speaking to the relative difficulty in serializing a type using
> one method or another.  In other words, in Python it becomes:
>
> import struct; struct.pack('>d', val)
>
> versus
>
> str(val)
>
> Both of which only need to be implemented once, of course.
>
>> -----Original Message-----
>> From: "Eric Evans" <eevans@rackspace.com>
>> Sent: Thursday, November 4, 2010 2:59pm
>> To: client-dev@cassandra.apache.org
>> Subject: Re: Calling all library maintainers
>>
>> On Thu, 2010-11-04 at 21:28 +0200, Ran Tavory wrote:
>> > A QL can shield clients from a class of changes, but OTOH will make
>> > clients have to compose the query strings, where with type safe
>> > libraries this job is somewhat easier. IMO in the near term
>> > introducing a query language will make client dev somewhat harder b/c
>> > of the (somewhat negligible) work of composing query strings and
>> > mostly b/c I don't expect the QL to be stable at v1 so still a moving
>> > target, but easier in the the long term mainly due to the hope that
>> > the QL will stabilize.
>>
>> I think you could argue that it makes all of this easier.  Right now
>> from Java you serialize a type to a byte[] whereas with the query
>> language you'd serialize to a string term.  That's a bit more effort out
>> of the gate for primitives like long for example, but consider the
>> venerable TimeUUID that causes so much frustration.  I think it would be
>> much easier to take a timestamp and construct a term like
>> TIMEUUID(<timestamp>) (or whatever), especially since that would work
>> identically across all clients.
>>
>> And it's also worth pointing out that not all languages in use are
>> statically typed, so even in the case of an int, or a long, it'd be
>> easier (or as easy at least), to serialize that to a string like
>> 100000L, than it would be to pack a binary string in network-order.
>>
>> As for not being stable, well, yeah it's going to need to bake a bit
>> before being suitable for widespread use, but I raise it here not to
>> encourage everyone to transition now, but so that you can help shape the
>> outcome (if you're interested, of course).
>>
>> > One other benefit of query languages is that they make tooling a
>> > little easier, one does not have to come up with a specific CLI
>> > interpreter or a web interface with a set of input fields, you just
>> > have to type your QL into a text box or a terminal like you do with
>> > sql.
>> > Long term I think I'm in for a QL (although I have to think about the
>> > syntax you suggested) but I don't expect it to benefit client devs in
>> > the near term even if it was ready today as an alternative to thrift.
>> >
>> > One small question, does this language tunnel through avro or thrift
>> > calls? (Is >>> conn.execute() an avro or thrift call)
>>
>> It's avro for the simple reason that that's still sort of an
>> experimental code path and seemed a less controverial sandbox.  When the
>> spec and implementation are complete, and if it gains suitable traction,
>> I'd actually like to explore a customized transport and serialization.
>>
>
>
> --
> Eric Evans
> eevans@rackspace.com
>
>

I still think the query language is a good idea but I have one
negative point about it.

One of the selling point about a simple data model and access language
was that there were never issues where a query planner "refused" to do
the query the "optimal way" the user desired. For example a query
using order and limit would first order the dataset and then limit
when the user wanted to limit then order.

Also without sounding syndical, I see SQL-ify catering to the lower
half. Take projecting columns from a row for example. SQL-ish is going
to encourage people to NOT learn about SlicePredicate and attempt get
by using the SQL interface. They will not understand how to take
advantage of the data model and what it provides. With 7.0 where
schema changes can happen on the fly, users are going to have more
freedom to create ColumnFamilies. Aided by their QL interface and
their pre-disposition to think SQL they are going to structure column
families like SQL tables. They could end up with unoptimized tables
and planner making the non optimal queries.

I somewhat feel a QL language would be like Cassandra training wheels.

Mime
View raw message