directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Quanah Gibson-Mount <qua...@stanford.edu>
Subject Re: Various questions
Date Tue, 06 Jun 2006 19:51:05 GMT


--On Tuesday, June 06, 2006 9:37 AM +0200 Emmanuel Lecharny 
<elecharny@gmail.com> wrote:

> Hi guys !
>
> Quanah Gibson-Mount a écrit :
>
>> I think it is important to allow specification of what indices to use
>> for a given attribute for a few reasons.  One, that you can use it to
>> actually make some searches slow enough to hinder efforts (like we
>> have a spam troller routinely trying to get data from our sources that
>> is fairly obnoxious),
>
> In my mind, it's pretty much a security issue. You can add an
> authentication to avoid such behavior, or, if your data are public, then
> you have no reason to slow down the searches. Limiting the number of
> results may be more efficient. Btw, this is a real problem for a server,
> and something we sqhoudl consider : how to avoid DOS on a LDAP server
> (either by flooding, or with malformed requests, or with huge data). We
> still have to address those attacks. At this point, I may have a question
> : is it frequent usage for Ldap server to be exposed outside a company?
> Generally speaking, I never saw that. User data are really supposed to be
> private and not accessible from unidentified user. I may be totally
> wrong, but if I see a Ldap Server exposed to the world - never saw that
> for years -, the first thing I would ask the Admins is to close the door
> of their system. Just my opinion.


Well, I see ldap servers expose data to the world all the time.  Pretty 
much any university I send random queries to does so.  @ Stanford, we allow 
users to affect the "visibility" of their data, with 3 settings:

"world" -- Avaliable to anyone, including anonymous
"stanford" -- Available only to those people who have authenticated as 
being from Stanford
"private" -- Not visible to anyone by normal means (specific applications 
get by this)


Since there is a fair amount of data then available to anyone who wants to 
run a query because of policy, I do try my best to do due diligence and cut 
down on spam harvesting runs.  We do have a result limit on the server, but 
the people I've run across are savvy enough to use batched queries of 
different ranges to effectively get around that in at least part.

People also like to be able to use their email clients to get information 
from the directory servers, and very few of them (only one that I've found) 
support SASL/GSSAPI binds, which is the only authentication method we allow 
(no username/password).


>> another is that the more indices you have on an attribute, the larger
>> the total database is, and the longer it takes to load.  This of
>> course depends on part in the OS/Cpu used as well.  For example, I
>> currently index 90 attributes in my database to varying degrees (most
>> are eq, which is a fairly minimal index).  On my Solaris sparc
>> systems, it takes 2.5ish hours to load the database.  On my new AMD
>> systems that'll be replacing the Sun Sparc boxes, it takes all of 14.5
>> minutes.  However, if all 90 of those attributes were getting indexed
>> pres,eq,sub, the amount of time to load would increase significantly.
>
> well, in production, loading a server ris not something you do very
> often. You may need to restore a crashed database, or reload a database
> which structure has change, but this is definitively not a real concern.
> Load once, use many.


I think that's a good thought in theory, and is what I thought too. 
However, I run 4 environments (dev, test, uat, and production).  We have a 
custom schema that we modify a few times a year, and those modifications 
are usually large enough to warrant a complete reload of the data that is 
generated from our RDBMS for the ldap servers.  As a part of that process, 
dev may be reloaded several times as bugs are fixed, etc, and the same goes 
for test.  So I actually reload my servers a bit. ;)


>> Currently, my indices take up 1.1GB of disk space in OpenLDAP (I'm not
>> sure how that exactly map out in Apache DS).  My database entry file
>> takes 2.7GB.  So my indices are approximately 1/3 of my database size.
>
> 3Gb is really nothing. A 15K Rpm SCSI disk is now 36 Gb minimum and cost
> aroung 200$. Not a big deal. Better spend money of memory sticks rather
> that on high performance disks :)
>
> I don't want to say that making it possible to select indices is *bad*,
> but, IMHO, this may be a cool feature that is a little bit overkilling,
> when you balance it with real usages. For real RDBMS, having twice the
> size on disk for indices is considered plain normal. I don't think we
> should go that far, but when you choose to set indices on  an attribute,
> this may not be very important to offer a choice on which kind of indices
> you want.

Yeah, my concerns here may be more specific to OpenLDAP and the use of BDB. 
When bulk loading, it is quickest to have enough BDB cache as the entire 
size of your database (3.8GB in the case above).  On Solaris SPARC, I found 
that the only good way to get performance was to use a shared memory region 
(Linux doesn't require that), which means that I have to have as much 
memory available as BDB cache on the system, and memory is sadly not so 
cheap as disk.


--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITS/Shared Application Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Mime
View raw message