couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karel Minařík <karel.mina...@gmail.com>
Subject Re: How to import data quickly
Date Mon, 01 Feb 2010 14:24:42 GMT
Hi,

just for info, on a current project I needed to import 6mil+ of docs,  
and the sweet spot was 10K docs per on batch upload. Higher values  
gave worse results. I don't have the numbers handy, but it took couple  
of hours to convert the docs from CSV and bulk upload them into Couch,  
I guess like 8hrs (on a rather old IBM Blade machine)... (And the real  
pain was handling malformed CSV parts, patching FasterCSV to not choke  
on it, etc.)

Karel

On 28.Jan, 2010, at 15:02 , Troy Kruthoff wrote:

> Just curious, what batch size did you use...  I was just getting to  
> run some test data to see where the sweet spot is for our hardware,  
> I remember reading somewhere that someone thought it was around 3k  
> docs.
>
> Troy
>
>
> On Jan 28, 2010, at 4:21 AM, Sean Clark Hess wrote:
>
>> Sweet... down to 28 minutes with bulk. Thanks
>>
>> On Thu, Jan 28, 2010 at 4:25 AM, Sean Clark Hess  
>> <seanhess@gmail.com> wrote:
>>
>>> Ah, I forgot about bulk! Thanks!
>>>
>>>
>>> On Thu, Jan 28, 2010 at 4:24 AM, Alex Koshelev  
>>> <daevaorn@gmail.com> wrote:
>>>
>>>> How do you import data to CouchDB? Do you use _bulk API?
>>>> ---
>>>> Alex Koshelev
>>>>
>>>>
>>>> On Thu, Jan 28, 2010 at 1:51 PM, Sean Clark Hess <seanhess@gmail.com 
>>>> >
>>>> wrote:
>>>>
>>>>> I'm trying to import 7 million rows into couch from an xml  
>>>>> document. If
>>>> I
>>>>> use a database with a "normal" interface (comparing with Mongo  
>>>>> here),
>>>> the
>>>>> process completes in 37 minutes. If I use couch, it takes 10  
>>>>> hours. I
>>>> think
>>>>> it might be due to the overhead of the http interface, but I'm  
>>>>> not sure.
>>>>>
>>>>> Is there any way to get data in there faster?
>>>>>
>>>>> ~sean
>>>>>
>>>>
>>>
>>>
>


Mime
View raw message