incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dotan N." <dip...@gmail.com>
Subject Re: data agility
Date Sun, 20 Nov 2011 21:24:02 GMT
Jahangir, thanks! however I've noted that we may very well need to scale to
200M users or "entities" within a short amount of time - say a year or two,
10M within few months.


--
Dotan, @jondot <http://twitter.com/jondot>



On Sun, Nov 20, 2011 at 11:14 PM, Jahangir Mohammed <md.jahangir27@gmail.com
> wrote:

> IMHO, you should start with something very simple RDBMS and meanwhile
> getting handle over Cassandra or other noSql technology. Start out with
> simple, but always be aware and conscious of the next thing you will have
> in stack. It's timetaking to work with new technology if you are in the
> phase of prototyping something fast and geared towards a Vc demo. In most
> of the cases, you won't need noSql for a while unless there is a very
> strong case.
>
> Thanks,
> Jahangir
> On Nov 20, 2011 4:04 PM, "Dotan N." <dipidi@gmail.com> wrote:
>
>> Thanks David.
>> Stephen: thanks for the tip, we can run a recommended configuration, so
>> that wouldn't be an issue. I guess I can focus that my questions are on
>> complexity of development.
>>
>> After digesting David's answer, I guess my follow up questions would be
>> - how would you process data in a cassandra cluster, typically? via
>> one-off coded offline jobs?
>> - how easy is map/reduce on existing data (just looked at brisk but it
>> may be unrelated, any case, not too much written about it)
>> - how would you do analytics over a cassandra cluster
>> - given the common examples of time-series, how would you recommend to
>> aggregate (sum, avg, facet) and provide statistics over the collected data?
>> for example if it were kinds of logs and you'd like to group all of certain
>> fields in it, or provide a histogram over it.
>>
>> Thanks!
>>
>>
>> --
>> Dotan, @jondot <http://twitter.com/jondot>
>>
>>
>>
>> On Sun, Nov 20, 2011 at 10:32 PM, Stephen Connolly <
>> stephen.alan.connolly@gmail.com> wrote:
>>
>>> if your startup is bootstrapping then cassandra is sometimes to heavy to
>>> start with.
>>>
>>> i.e. it needs to be fed ram... you're not going to seriously run it in
>>> less than 1gb per node... that level of ram commitment can be too much
>>> while bootstrapping.
>>>
>>> if your startup has enough cash to pay for 3-5 recommended spec (see
>>> wiki) nodes to be up 24/7 then cassandra is a good fit...
>>>
>>> a friend of mine is bootstrapping a startup and had to drop back to
>>> mysql while he finds his pain points and customers... he knows he will end
>>> up jumping back to cassandra when he gets enough customers (or a VC) but
>>> for now the running costs are too much to pay from his own pocket... note
>>> that the jdbc driver and cql will make jumping back easy for him (as he
>>> still tests with c*... just runs at present against mysql.... nuts eh!)
>>>
>>> - Stephen
>>>
>>> ---
>>> Sent from my Android phone, so random spelling mistakes, random nonsense
>>> words and other nonsense are a direct result of using swype to type on the
>>> screen
>>> On 20 Nov 2011 19:07, "Dotan N." <dipidi@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> my question may be more philosophical than related technically
>>>> to Cassandra, but please bear with me.
>>>>
>>>> Given that a young startup may not know its product full at the early
>>>> stages, but that it definitely points to ~200M users,
>>>> would Cassandra will be the right way to go?
>>>>
>>>> That is, the requirement is for a large data store, that can move with
>>>> product changes and requirements swiftly.
>>>>
>>>> Given that in Cassandra one thinks hard about the queries, and then
>>>> builds a model to suit it best, I was thinking of
>>>> this situation as problematic.
>>>>
>>>> So here are some questions:
>>>>
>>>> - would it be wiser to start with a more agile data store (such as
>>>> mongodb) and then progress onto Cassandra, when the product itself
>>>> solidifies?
>>>> - given that we start with Cassandra from the get go, what is a common
>>>> (and quick in terms of development) way or practice to change data, change
>>>> schemas, as the product evolves?
>>>> - is it even smart to start with Cassandra? would only startups whose
>>>> core business is big data start with it from the get go?
>>>> - how would you do map/reduce with Cassandra? how agile is that? (for
>>>> example, can you run map/reduce _very_ frequently?)
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> Dotan, @jondot <http://twitter.com/jondot>
>>>>
>>>>
>>

Mime
View raw message