From user-return-30498-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Fri Dec 7 03:37:16 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 977FAD3D0 for ; Fri, 7 Dec 2012 03:37:16 +0000 (UTC) Received: (qmail 37671 invoked by uid 500); 7 Dec 2012 03:37:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37631 invoked by uid 500); 7 Dec 2012 03:37:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37613 invoked by uid 99); 7 Dec 2012 03:37:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 03:37:13 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a50.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Dec 2012 03:37:07 +0000 Received: from homiemail-a50.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a50.g.dreamhost.com (Postfix) with ESMTP id 506C06F8078 for ; Thu, 6 Dec 2012 19:36:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=svVlraH+tBxdx6wxh3hs5BEZVq k=; b=OqH4e9SsSAiO7qLP9OzEfLcDTHp2QiofoG9fvhwgUmoPEhGyQ9QZZ3plKs BvJYLD2gUiwM6YSKVNaWrJeqJcq7F23Ll6cwF5xibHyYp5TKVaAZJcai+bqPqCMk yTjDB+ZxBA2UzQCcQucHHaz7KT7FlK1CSIqTt1b/nIXP0IHCU= Received: from [172.20.10.2] (unknown [118.148.182.132]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a50.g.dreamhost.com (Postfix) with ESMTPSA id 8A5CE6F8074 for ; Thu, 6 Dec 2012 19:36:45 -0800 (PST) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_A6D6F7F0-86CD-40BE-9F33-C62859D83910" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Help on MMap of SSTables Date: Fri, 7 Dec 2012 16:36:43 +1300 References: <81D05238-BD4C-46C4-AAAA-CD6A68CF7567@thelastpickle.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_A6D6F7F0-86CD-40BE-9F33-C62859D83910 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 > So for memory mapped files, compaction can do a madvise SEQUENTIAL = instead of current DONTNEED flag after detecting appropriate OS = versions. Will this help? AFAIK Compaction does use memory mapped file access.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 7:48 PM, Ravikumar Govindarajan = wrote: > Thanks Aaron, >=20 > I found the implementation in CLibrary.trySkipCache() method which = uses fadvise DONTNEED flag after going through = https://issues.apache.org/jira/browse/CASSANDRA-1470 >=20 > I also came across the link mentioned in JIRA = http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html?show= Comment=3D1303235497682#c2572106601600642254 >=20 > which says 2.6.29 version above has implemented madvise SEQUENTIAL in = a better manner. >=20 > So for memory mapped files, compaction can do a madvise SEQUENTIAL = instead of current DONTNEED flag after detecting appropriate OS = versions. Will this help? >=20 > -- > Ravi >=20 > On Thu, Dec 6, 2012 at 8:19 AM, aaron morton = wrote: > Background http://en.wikipedia.org/wiki/Memory-mapped_file >=20 >> Is it going to load only relevant pages per SSTable on read or is it = going to load an entire SSTable on first access? >=20 > It will load what is requested, and maybe some additional data taking = into account the amount of memory available for caches.=20 >=20 >> Say suppose compaction kicks in. Will it then evict hot MMapped pages = for read and substitute it with a lot of pages involving full SSTables? >=20 > Some file access in cassandra, such as compaction, hints to the OS = that the reads should not be cached. Technically is uses posix_fadvise = if you want to look it up. >=20 > Cheers >=20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand >=20 > @aaronmorton > http://www.thelastpickle.com >=20 > On 5/12/2012, at 11:04 PM, Ravikumar Govindarajan = wrote: >=20 >> Thanks Aaron, >>=20 >> I am not quite clear on how MMap loads SSTables other than the fact = that it kicks in only during a first-time access >>=20 >> Is it going to load only relevant pages per SSTable on read or is it = going to load an entire SSTable on first access? >>=20 >> Say suppose compaction kicks in. Will it then evict hot MMapped pages = for read and substitute it with a lot of pages involving full SSTables? >>=20 >> -- >> Ravi >>=20 >> On Wed, Dec 5, 2012 at 1:22 AM, aaron morton = wrote: >>> Will MMapping data files be detrimental for reads, in this case? >> No.=20 >>=20 >>> In general, when should we opt for MMap data files and what are the = factors that need special attention when enabling the same? >> mmapping is the default, so I would say use it until you have a = reason not to.=20 >>=20 >> mmapping will map the entire file, but pages of data are read into = memory on demand and purged when space is needed.=20 >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >>=20 >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 4/12/2012, at 11:59 PM, Ravikumar Govindarajan = wrote: >>=20 >>> Our current SSTable sizes are far greater than RAM. {150 Gigs of = data, 32GB RAM}. Currently we run with mlockall and mmap_index_only = options and don't experience swapping at all. >>>=20 >>> We use wide rows and size-tiered-compaction, so a given key will = definitely be spread across multiple sstables. Will MMapping data files = be detrimental for reads, in this case? >>>=20 >>> In general, when should we opt for MMap data files and what are the = factors that need special attention when enabling the same? >>>=20 >>> -- >>> Ravi >>=20 >>=20 >=20 >=20 --Apple-Mail=_A6D6F7F0-86CD-40BE-9F33-C62859D83910 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 So for memory mapped files, compaction can = do a madvise SEQUENTIAL instead of current DONTNEED flag after detecting = appropriate OS versions. Will this = help?

AFAIK Compaction does use memory = mapped file = access. 

Cheers

http://www.thelastpickle.com

On 6/12/2012, at 7:48 PM, Ravikumar Govindarajan <ravikumar.govindarajan@gm= ail.com> wrote:

Thanks = Aaron,

I found the implementation in = CLibrary.trySkipCache() method which uses fadvise DONTNEED flag after = going through https://issu= es.apache.org/jira/browse/CASSANDRA-1470


which says 2.6.29 version above has implemented = madvise SEQUENTIAL in a better manner.

So for = memory mapped files, compaction can do a madvise SEQUENTIAL instead of = current DONTNEED flag after detecting appropriate OS versions. Will this = help?

--
Ravi

On= Thu, Dec 6, 2012 at 8:19 AM, aaron morton <aaron@thelastpickle.com> wrote:
Background http://en.wikipedia.org/wiki/Memory-mapped_file
=
Is it going = to load only relevant pages per SSTable on read or is it going to load = an entire SSTable on first access?
It will load = what is requested, and maybe some additional data taking into account = the amount of memory available for caches. 

Say = suppose compaction kicks in. Will it then evict hot MMapped pages for = read and substitute it with a lot of pages involving full = SSTables?
Some file access in cassandra, such as compaction, hints to the OS = that the reads should not be cached. Technically is = uses posix_fadvise if you want to look it up.

Cheers


-----------------
Aaron Morton
Freelance = Cassandra Developer
New = Zealand

@aaronmorton

On 5/12/2012, at 11:04 PM, = Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com> = wrote:

Thanks Aaron,

I am not quite clear on how MMap loads = SSTables other than the fact that it kicks in only during a first-time = access

Is it going to load only relevant pages = per SSTable on read or is it going to load an entire SSTable on first = access?

Say suppose compaction kicks in. Will it then evict = hot MMapped pages for read and substitute it with a lot of pages = involving full = SSTables?

--
Ravi

On Wed, Dec 5, 2012 at 1:22 AM, aaron morton <aaron@thelastpickle.com> = wrote:
Will = MMapping data files be detrimental for reads, in this = case?
No. 

In general, when should we opt for MMap data files and = what are the factors that need special attention when enabling the = same?
mmapping is the default, so I would say use it until you have a = reason not to. 

mmapping will map the entire = file, but pages of data are read into memory on demand and purged when = space is needed. 

Cheers

-----------------
Aaron Morton
Freelance = Cassandra Developer
New = Zealand

@aaronmorton

On 4/12/2012, at 11:59 PM, Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com> = wrote:

Our current SSTable sizes are = far greater than RAM. {150 Gigs of data, 32GB RAM}. Currently we run = with mlockall and mmap_index_only options and don't experience swapping = at all.

We use wide rows and size-tiered-compaction, so a given = key will definitely be spread across multiple sstables. Will MMapping = data files be detrimental for reads, in this case?

In general, when should we opt for MMap data files = and what are the factors that need special attention when enabling the = same?

--
Ravi
=


=



= --Apple-Mail=_A6D6F7F0-86CD-40BE-9F33-C62859D83910--