cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Whitehead <>
Subject Re: Cassandra to store 1 billion small 64KB Blobs
Date Wed, 28 Jul 2010 06:15:41 GMT
Just a warning about ZFS. If the plan is to use JBOD w/RAID-Z, don't.
3, 4, 5, ... or N disks in a RAID-Z array (using ZFS) will result in
read performance equivalent to only 1 disk.

Check out this blog entry:

The second chart and the section "The Parity Performance Rathole" are
both a must read.

On Fri, Jul 23, 2010 at 11:51 PM, Michael Widmann
<> wrote:
> Hi Jonathan
> Thanks for your very valuable input on this.
> I maybe didn't enough explanation - so I'll try to clarify
> Here are some thoughts:
> binary data will not be indexed - only stored.
> The file name to the binary data (a hash) should be indexed for search
> We could group the hashes in 62 "entry" points for search retrieving -> i
> think suprcolumns (If I'm right in terms) (a-z,A_Z,0-9)
> the 64k Blobs meta data (which one belong to which file) should be stored
> separate in cassandra
> For Hardware we rely on solaris / opensolaris with ZFS in the backend
> Write operations occur much more often than reads
> Memory should hold the hash values mainly for fast search (not the binary
> data)
> Read Operations (restore from cassandra) may be async - (get about 1000
> Blobs) - group them restore
> So my question is too:
> 2 or 3 Big boxes or 10 till 20 small boxes for storage...
> Could we separate "caching" - hash values CFs cashed and indexed - binary
> data CFs not ...
> Writes happens around the clock - on not that tremor speed but constantly
> Would compaction of the database need really much disk space
> Is it reliable on this size (more my fear)
> thx for thinking and answers...
> greetings
> Mike
> 2010/7/23 Jonathan Shook <>
>> There are two scaling factors to consider here. In general the worst
>> case growth of operations in Cassandra is kept near to O(log2(N)). Any
>> worse growth would be considered a design problem, or at least a high
>> priority target for improvement.  This is important for considering
>> the load generated by very large column families, as binary search is
>> used when the bloom filter doesn't exclude rows from a query.
>> O(log2(N)) is basically the best achievable growth for this type of
>> data, but the bloom filter improves on it in some cases by paying a
>> lower cost every time.
>> The other factor to be aware of is the reduction of binary search
>> performance for datasets which can put disk seek times into high
>> ranges. This is mostly a direct consideration for those installations
>> which will be doing lots of cold reads (not cached data) against large
>> sets. Disk seek times are much more limited (low) for adjacent or near
>> tracks, and generally much higher when tracks are sufficiently far
>> apart (as in a very large data set). This can compound with other
>> factors when session times are longer, but that is to be expected with
>> any system. Your storage system may have completely different
>> characteristics depending on caching, etc.
>> The read performance is still quite high relative to other systems for
>> a similar data set size, but the drop-off in performance may be much
>> worse than expected if you are wanting it to be linear. Again, this is
>> not unique to Cassandra. It's just an important consideration when
>> dealing with extremely large sets of data, when memory is not likely
>> to be able to hold enough hot data for the specific application.
>> As always, the real questions have lots more to do with your specific
>> access patterns, storage system, etc. I would look at the benchmarking
>> info available on the lists as a good starting point.
>> On Fri, Jul 23, 2010 at 11:51 AM, Michael Widmann
>> <> wrote:
>> > Hi
>> >
>> > We plan to use cassandra as a data storage on at least 2 nodes with RF=2
>> > for about 1 billion small files.
>> > We do have about 48TB discspace behind for each node.
>> >
>> > now my question is - is this possible with cassandra - reliable - means
>> > (every blob is stored on 2 jbods)..
>> >
>> > we may grow up to nearly 40TB or more on cassandra "storage" data ...
>> >
>> > anyone out did something similar?
>> >
>> > for retrieval of the blobs we are going to index them with an hashvalue
>> > (means hashes are used to store the blob) ...
>> > so we can search fast for the entry in the database and combine the
>> > blobs to
>> > a normal file again ...
>> >
>> > thanks for answer
>> >
>> > michael
>> >
> --
> - Professional Online Backup Solutions for Small and Medium Sized
> Companies

View raw message