cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Nodine <lucasnod...@gmail.com>
Subject Re: Large File Storage
Date Wed, 15 Sep 2010 16:26:24 GMT
Jonathan,

So it is "safe" to use a column to hold the entire data file assuming there
is enough heap space?  Or are there other considerations of which I should
be concerned?

As thrift is used and all data must be loaded into memory (see Wiki), I
should still expect to benefit from breaking the data file into seperate
packages and storing those packages in different columns.  Is this how you
(or anyone else reading for that matter) would suggest as a "best practice"?

- Lucas

On Wed, Sep 15, 2010 at 10:54 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> the row-in-memory-during-compaction was fixed some time ago for 0.7
> (CASSANDRA-16).
>
> On Wed, Sep 15, 2010 at 10:03 AM, Lucas Nodine <lucasnodine@gmail.com>
> wrote:
> > Hello Users,
> >
> > I am planning a system where both metadata and data will be stored.
> Usually
> > it will be small file such as word documents along with some specific
> data
> > about the file.  Sometimes, there will be a large file, possibly a few
> > hundred meg - a gig such as video.  I have read a lot about suggested
> > methods for large file storage within Cassandra, but I want to verify my
> > thoughts on the method of implementation before I start working on it.
> >
> > On June 29, 2009 Jonathan listed the task on JIRA
> > (https://issues.apache.org/jira/browse/CASSANDRA-265) - but closed it
> > stating that it was not on anyone's roadmap
> >
> > On April 26, 2010 there was a posting to this group stating "During
> > compaction, as is well noted, Cassandra needs the entire row in memory,
> > which will cause a FAIL once you have files more than a few gigs." Shuge
> > Lee.
> >
> > Currently, the Wiki has an entry explaining the handling, or more
> > appropriately, workaround to handle Large BLOBs
> > (http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage).
> >
> > Seeing as native support for large files is not expected, and the Wiki
> > states that files <= 64MB can easily be stored within the database and
> > knowing that during compaction, the entire row will be loaded into
> memory...
> >
> > 1) Is the appropriate way to handle files that greatly vary in size (1KB
> to
> > a few GB) to break the data into smaller "chunks" and then store those
> > chunks each into a seperate row?
> >     A) If so, how should it be done to accomplish the best read/write
> > results?
> >     B) Is there a row size that should be considered a "sweet spot" or
> > should it be able to be modified on a per cluster basis?
> > 2) Does anyone forsee large blob support in the coming future?
> >
> > Thanks,
> >
> > - Lucas
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message