couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <dam...@apache.org>
Subject Re: [jira] Commented: (COUCHDB-61) Separate storage of attachments from the main database file
Date Sun, 28 Dec 2008 11:24:53 GMT
What you are proposing is grouping the document data together for  
better OS caching performance, but CouchDB already does that.  
Documents bodies are written to contiguous regions, one after another,  
the attachments are stored in separate locations in the file.

-Damien


On Dec 28, 2008, at 5:21 AM, Maximillian Dornseif (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/COUCHDB-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659425

> #action_12659425 ]
>
> Maximillian Dornseif commented on COUCHDB-61:
> ---------------------------------------------
>
> To my understanding of OS access pattern they actually do interfere  
> with document  access and view building.
>
> Lets say you have 100 MB of documents and 100 GB of attatchments. To  
> my (very limited) understanding in the current database file  
> "normal" b+-tree pages and attatchments  would be interleaved. So on  
> disk (assuming best-case continues allocation) it would look  
> something like
>
> DAAADADDAAAAAAAADADDAAAAAAADAAAADAAAAADAAAAADAAAAADADAAAAAADA
>
> (D=doc, A=attatchment)
> So if I want to access the whole B+-Tree for an operation I have to  
> skip around in the file by using seeks or some other techniques.  
> Seeks harm the caching ability of the OS and are generally slow. And  
> the OS obviously is not able to read the whole 100.1 GB file in  a  
> single chunk into it's cache.
>
> Compare that to having two files:
>
> DDDDDDDDDDDDDD = document file, 100 MB
> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA = attachment file,  
> 100 GB
>
> Now the OS caching subsystem can happily read the whole document  
> file into RAM. Even if CouchDB uses seeks they can be served from  
> the cache.  The OS doesn't have to wrong around the huge binary  
> chunks of seldom accessed attachment data.
>
> Generally to get most out of the IO optimizations of an Operating  
> System don't save data with different  access patterns in the same  
> file. The exact impact differs very much from OS to OS but this is  
> one of the main reasons why databases (even MySQL) use different  
> files for different parts of the database - unless they manage  
> diskspace independently of the OS with raw partitions.
>
> All this assumes that you access and change attachments less often  
> than your documents and documents are considerable smaller than  
> attachments. I would call this a save bet for most scenarios.
>
>
>
>> Separate storage of attachments from the main database file
>> -----------------------------------------------------------
>>
>>                Key: COUCHDB-61
>>                URL: https://issues.apache.org/jira/browse/COUCHDB-61
>>            Project: CouchDB
>>         Issue Type: New Feature
>>         Components: Database Core
>>        Environment: All
>>           Reporter: Jan Lehnardt
>>           Priority: Minor
>>
>> At the moment all document- and attachment-data go into the same  
>> database file. It would be nice if the attachments could be saved  
>> in a different file. This would enable the use of slower and  
>> cheaper hardware for attachment storage and faster hardware for the  
>> document and index data storage.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


Mime
View raw message