incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <tsalora...@gmail.com>
Subject Re: Heap sudden jump during import
Date Thu, 08 Apr 2010 16:09:01 GMT
On Wed, Apr 7, 2010 at 1:51 PM, Eric Evans <eevans@rackspace.com> wrote:
> On Tue, 2010-04-06 at 10:55 -0700, Tatu Saloranta wrote:
>> On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight <beuknight@gmail.com>
>> wrote:
>> > When import, all data in json file will load in memory. So that, you
>> can not
>> > import large data.
>> > You need to export large sstable file to many small json files, and
>> run
>> > import.
>>
>> Why would you ever read the whole file in memory? JSON is very easily
>> streamable. Or does the whole data set need to be validated or
>> something (I assume not, if file splitting could be used). Perhaps it
>> is just an implementation flaw in importer tool.
>
> It's been awhile, but if I'm not mistaken, this is because we're writing
> SSTables and the records must be written in decorated-key sorted order.

Ok. It might make sense to solve this then, for example by using
external sorting?

(reminds me that I must clean up and release basic on-disk merge sort
code that seems to be something that is not included in existing
commons lib, oddly enough -- we used it for this purpose, pre-sorting
data for systems that required it, or benefited heavily)

-+ Tatu +-

Mime
View raw message