incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manjunath Somashekhar <manjunath_somashek...@yahoo.com>
Subject Question on performance of getting document by id
Date Tue, 30 Dec 2008 06:07:11 GMT
hi All,

I have been evaluating couchDB for a project and was trying some performance tests, one of
them is to test the performance of getting a document by id.

I have written up a small python script that loads about a million documents (very simple
- {_id, value}), for the test i assigned the ids myself instead of the uuid assigned by couchDB.
id starts from 0 and reach a million.

After the loading is done, I ran another python script that tries to get each of the million
documents - it ran for a few hours and then i killed it.
Tried running the same python script simultaneously with different key ranges (4 to be precise)
- it ran and completed in about 3 hrs on a mimimum - for multiple runs.

This means 1000000/(3*60*60) ~ 93 gets per second. Is this the current performance benchmark
? or is there some thing stupid that i am doing. BTW this is way too slow for the application
i was exploring couchDB for.

Please let me know if there are any suggestions.

Environment:
python-couchDB lib - latest version 0.5.x
python - 2.5.3
ubuntu - 8.04
laptop has 4G or RAM, dual core and about 80G of HDD.
python-couchDB - bulk docs - insertion.
python-couchDB - get by id - multiple options tried - like db[doc_id], db.get(doc_id)
Tried creating a view on id (was just trying - AFAIU an index should already exist on id)
- took hours and hours and i killed it.

Sample code:
### insertion ###
count = 0
id = 0
lineCount = 0
# input comes from STDIN (standard input)
batch = []
for line in sys.stdin:
    # remove leading and trailing whitespace
    lineCount += 1
    line = line.strip()
    values = []
    # parsing of the input line
    for size in range(sizesLength):
        value = line[absIndex[size]:absIndex[size + 1]].strip()
        values.append(value)
        if size == sizesLength - 2:
            break
    
    idS = '%s' % (id)

    batch.append({"partnerCode":values[1],
                       "_id":idS 
                      })
    count += 1
    id += 1
    
    if count % 10000 == 0:
       db.update(batch)
### insertion ###

### fetch ###
for line in range(1000000):   
    idS = '%s' % (line)
    tx = db[idS]
### fetch ###

Thanks
Manju


      

Mime
View raw message