cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <>
Subject Re: any code to load large data from web into Cassandra
Date Sat, 27 Dec 2014 13:09:04 GMT
Sorry, but you are still not being clear. In particular, "website data" has
no common, defined meaning. You'll need to use some standard, defined
terminology or specific examples so that we can have some idea what you are
referring to.

The blog post you cited is referring to the Twitter API, presumably to read
tweets. Okay, fine, but you'll have to be more specific about what you want
to do with them. Yes, Cassandra is primarily focus on structured data, but
you can of course store unstructured and semi-structured data as blobs,
JSON strings, map columns, etc.

Please describe in a little more detail what problem you are trying to

I mean, "website data" might mean any data (in any format) stored at a web
URL, which might be a web "page", a data file linked by a web page, or...
it could be a REST API like Twitter). Or it could be... whatever. Cassandra
is basically a storage engine - it can store anything. There are a wide
variety of tools that can be used to "ingest" data from the infinite
variety of "sources" for data. But you'll need to state more specifically
what you are actually tring to accomplish.

Also, "large data" could be... anything, like "Big Data". So more
specificity is needed.

Alternatively, you could hire a consultant to help guide you through the
"application analysis" process to determine your "application
requirements", and then you could simply post your application
requirements, or at least a concise summary or relevant excerpt.

-- Jack Krupansky

-- Jack Krupansky

On Sat, Dec 27, 2014 at 1:48 AM, Joanne Contact <>

> Thank you. I did not express clearly on my question.
> I wonder if there is sample code to load any website data to Cassandra?
> Say, this webpage seems to use Python, tweepy,
> to use twitter API to get data in json format and then load data into
> Cassandra.
> So it seems tweepy is special for twitter API. Is there a code for any
> website?
> Btw I am not familiar with Python yet. So the answer may not be limited to
> Python.
> Thanks!
> On Fri, Dec 26, 2014 at 12:46 PM, Keith Sterling <
>> wrote:
>> Take a look at sstableloader. We use it to load 30+m rows into Cassandra
>> Datastax documentation is a good staty
>> --
>> *Keith Sterling*
>> *Head of Software*
>>  *E:* <>
>>  *P:* +44 7771 597 630
>>  *W:* <>
>>  *A:* Opus 40 Business Park,
>> Haywood Road, Warwick CV34 5AH
>> On Fri, Dec 26, 2014 at 7:59 PM, Joanne Contact <>
>> wrote:
>>>  Hello I am new. Did not seem to find the answer after a brief
>>> research. Please help.
>>> Thanks!
>>> J

View raw message