cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dotan N." <dip...@gmail.com>
Subject Re: data agility
Date Sun, 20 Nov 2011 22:19:06 GMT
Thanks Aaron, I kept this use-case free as to focus on the higher level
description, it might have been a not a good idea.
But generally I think I got a better intuition from the various answers,
thanks!


--
Dotan, @jondot <http://twitter.com/jondot>



On Sun, Nov 20, 2011 at 11:52 PM, Aaron Turner <synfinatic@gmail.com> wrote:

> Sounds like you need to figure out what your product is going to do
> and what technology will best fit those requirements.  I know you're
> worried about being agile and all that, but scaling requires you to
> use the right tool for the job. Worry about new requirements when they
> rear their ugly head rather then a dozen of "what if" scenarios.
>
> You can scale MySQL/etc and Cassandra, MongoDB, etc to 10-200M "users"
> depending on what you're asking your datastore to do.  You haven't
> defined that really at all other then some comments about wanting to
> do some map/reduce jobs.
>
> Really what you should be doing is figuring out what kind of data you
> need to store and your needs like access patterns, availability, ACID
> compliance, etc and then figure out what technology is the best fit.
> There are tons of "Cassandra vs X" comparisons for every NoSQL DB in
> existence.
>
> Other then that, the map/reduce on Cassandra is more job based rather
> then useful for interactive queries so if that is important then
> Cassandra prolly isn't a good fit.  You did mention time series data
> too, and that's a sweet spot for Cassandra and not something I
> personally would put in a document based datastore like MonogoDB.
>
> Good luck.
> -Aaron
>
> On Sun, Nov 20, 2011 at 1:24 PM, Dotan N. <dipidi@gmail.com> wrote:
> > Jahangir, thanks! however I've noted that we may very well need
> to scale to
> > 200M users or "entities" within a short amount of time - say a year or
> two,
> > 10M within few months.
> >
> > --
> > Dotan, @jondot
> >
> >
> > On Sun, Nov 20, 2011 at 11:14 PM, Jahangir Mohammed
> > <md.jahangir27@gmail.com> wrote:
> >>
> >> IMHO, you should start with something very simple RDBMS and meanwhile
> >> getting handle over Cassandra or other noSql technology. Start out with
> >> simple, but always be aware and conscious of the next thing you will
> have in
> >> stack. It's timetaking to work with new technology if you are in the
> phase
> >> of prototyping something fast and geared towards a Vc demo. In most of
> the
> >> cases, you won't need noSql for a while unless there is a very strong
> case.
> >>
> >> Thanks,
> >> Jahangir
> >>
> >> On Nov 20, 2011 4:04 PM, "Dotan N." <dipidi@gmail.com> wrote:
> >>>
> >>> Thanks David.
> >>> Stephen: thanks for the tip, we can run a recommended configuration, so
> >>> that wouldn't be an issue. I guess I can focus that my questions are on
> >>> complexity of development.
> >>> After digesting David's answer, I guess my follow up questions would be
> >>> - how would you process data in a cassandra cluster, typically? via
> >>> one-off coded offline jobs?
> >>> - how easy is map/reduce on existing data (just looked at brisk but it
> >>> may be unrelated, any case, not too much written about it)
> >>> - how would you do analytics over a cassandra cluster
> >>> - given the common examples of time-series, how would you recommend to
> >>> aggregate (sum, avg, facet) and provide statistics over the collected
> data?
> >>> for example if it were kinds of logs and you'd like to group all of
> certain
> >>> fields in it, or provide a histogram over it.
> >>> Thanks!
> >>>
> >>> --
> >>> Dotan, @jondot
> >>>
> >>>
> >>> On Sun, Nov 20, 2011 at 10:32 PM, Stephen Connolly
> >>> <stephen.alan.connolly@gmail.com> wrote:
> >>>>
> >>>> if your startup is bootstrapping then cassandra is sometimes to heavy
> to
> >>>> start with.
> >>>>
> >>>> i.e. it needs to be fed ram... you're not going to seriously run it
in
> >>>> less than 1gb per node... that level of ram commitment can be too
> much while
> >>>> bootstrapping.
> >>>>
> >>>> if your startup has enough cash to pay for 3-5 recommended spec (see
> >>>> wiki) nodes to be up 24/7 then cassandra is a good fit...
> >>>>
> >>>> a friend of mine is bootstrapping a startup and had to drop back to
> >>>> mysql while he finds his pain points and customers... he knows he
> will end
> >>>> up jumping back to cassandra when he gets enough customers (or a VC)
> but for
> >>>> now the running costs are too much to pay from his own pocket... note
> that
> >>>> the jdbc driver and cql will make jumping back easy for him (as he
> still
> >>>> tests with c*... just runs at present against mysql.... nuts eh!)
> >>>>
> >>>> - Stephen
> >>>>
> >>>> ---
> >>>> Sent from my Android phone, so random spelling mistakes, random
> nonsense
> >>>> words and other nonsense are a direct result of using swype to type
> on the
> >>>> screen
> >>>>
> >>>> On 20 Nov 2011 19:07, "Dotan N." <dipidi@gmail.com> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>> my question may be more philosophical than related technically
> >>>>> to Cassandra, but please bear with me.
> >>>>> Given that a young startup may not know its product full at the
early
> >>>>> stages, but that it definitely points to ~200M users,
> >>>>> would Cassandra will be the right way to go?
> >>>>> That is, the requirement is for a large data store, that can move
> with
> >>>>> product changes and requirements swiftly.
> >>>>> Given that in Cassandra one thinks hard about the queries, and then
> >>>>> builds a model to suit it best, I was thinking of
> >>>>> this situation as problematic.
> >>>>> So here are some questions:
> >>>>> - would it be wiser to start with a more agile data store (such
as
> >>>>> mongodb) and then progress onto Cassandra, when the product itself
> >>>>> solidifies?
> >>>>> - given that we start with Cassandra from the get go, what is a
> common
> >>>>> (and quick in terms of development) way or practice to change data,
> change
> >>>>> schemas, as the product evolves?
> >>>>> - is it even smart to start with Cassandra? would only startups
whose
> >>>>> core business is big data start with it from the get go?
> >>>>> - how would you do map/reduce with Cassandra? how agile is that?
(for
> >>>>> example, can you run map/reduce _very_ frequently?)
> >>>>> Thanks!
> >>>>> --
> >>>>> Dotan, @jondot
> >>>
> >
> >
>
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>

Mime
View raw message