couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Troy Kruthoff <tkruth...@gmail.com>
Subject Re: How to import data quickly
Date Thu, 28 Jan 2010 14:25:50 GMT
I was prototyping an app with Mongo and Couch to see which one I liked  
the best (I've been flirting with couch for a long time, so I am  
somewhat biased -- but, I am a flirt and mongo looked attractive).   
Jan has a good article somewhere about why benchmarks are bad, so I  
have nothing but the wall clock to back me up, but for our app, couch  
was eating mongo on the writes, mongo ate couch for updates (upserts  
are nice), and after finally understanding a little map/reduce we got  
couch to do stuff that was very hard in mongo (we tried to use its  
opslog to feed amqp and do map/reduce, but never got it working).  YMMV

Troy


On Jan 28, 2010, at 6:06 AM, Sean Clark Hess wrote:

> Oh, and my math was totally off before. The full 7 million rows  
> takes 1.7h
> with MongoDB and 3h with Couch. (This is extrapolated from how long  
> it took
> to insert the first 100k documents)
>
> On Thu, Jan 28, 2010 at 7:02 AM, Troy Kruthoff <tkruthoff@gmail.com>  
> wrote:
>
>> Just curious, what batch size did you use...  I was just getting to  
>> run
>> some test data to see where the sweet spot is for our hardware, I  
>> remember
>> reading somewhere that someone thought it was around 3k docs.
>>
>> Troy
>>
>>
>>
>> On Jan 28, 2010, at 4:21 AM, Sean Clark Hess wrote:
>>
>> Sweet... down to 28 minutes with bulk. Thanks
>>>
>>> On Thu, Jan 28, 2010 at 4:25 AM, Sean Clark Hess  
>>> <seanhess@gmail.com>
>>> wrote:
>>>
>>> Ah, I forgot about bulk! Thanks!
>>>>
>>>>
>>>> On Thu, Jan 28, 2010 at 4:24 AM, Alex Koshelev <daevaorn@gmail.com>
>>>> wrote:
>>>>
>>>> How do you import data to CouchDB? Do you use _bulk API?
>>>>> ---
>>>>> Alex Koshelev
>>>>>
>>>>>
>>>>> On Thu, Jan 28, 2010 at 1:51 PM, Sean Clark Hess <seanhess@gmail.com

>>>>> >
>>>>> wrote:
>>>>>
>>>>> I'm trying to import 7 million rows into couch from an xml  
>>>>> document. If
>>>>>>
>>>>> I
>>>>>
>>>>>> use a database with a "normal" interface (comparing with Mongo  
>>>>>> here),
>>>>>>
>>>>> the
>>>>>
>>>>>> process completes in 37 minutes. If I use couch, it takes 10  
>>>>>> hours. I
>>>>>>
>>>>> think
>>>>>
>>>>>> it might be due to the overhead of the http interface, but I'm  
>>>>>> not
>>>>>> sure.
>>>>>>
>>>>>> Is there any way to get data in there faster?
>>>>>>
>>>>>> ~sean
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>


Mime
View raw message