Posts Tagged ‘Active Directory’

Identity Management Systems

Friday, June 27th, 2008

I’ve spent the past couple of days looking at IBM Tivoli Identity Manager (ITIM). One of our customers uses this product and wants us to be able to work with it. It’s pretty cool, but somewhat painful to get running. It’s a Web-based application so, naturally, it’s built on top of IBM Websphere. It needs a database where it can store authoritative identity information so, naturally, it needs IBM DB2. There’s the actual code itself, of course. Then there’s the “Directory Integrator” which can interface with other directory systems. Then there are “adapters” - I was using the Active Directory Adapter. It runs as a service, communicating with ITIM over http, ideally, over https (SSL). If you want to do the latter, you’ll need to install a certificate authority so that you can generate certificates for ITIM and the adapter. I used the “Rapid Install” option and it was pretty good, but only after I gave up trying to install on anything other than drive C (in Windows).

WIth all these components, I was pleasantly surprised that everything pretty much worked as expected. I’ve become accustomed to large systems being inherently flaky. ITIM was solid.

It was also pretty easy to modify ITIM. I took the AD adapter and was, relatively quickly, able to extend it to support the additional attributes that we use in AD. I also modified some forms to support input/modification of these attributes. It only took me a couple of iterations to get right (mostly, due to my own bad typing but also due to some unexpected changes in letter case). We can now use ITIM to provision and maintain accounts in AD that are usable by UNIX, Linux and Mac OS X machines outfitted with our Likewise Enterprise agent.

ITIM and other Identity Management Systems (IdMS) are a good idea for any company with a large number of employees that need computer accounts on many different systems. Although our software allows non-Windows systems to directly authenticate against Active Directory and, thus, eliminates the need to use an IdMS to provision UNIX, Linux and Mac OS X machines, an IdMS can still provide value to organizations that use Likewise software. First, an IdMS provides an established workflow for provisioning new user accounts. This workflow can include approval processes for any granting of extended privileges. Second, an IdMS typically can synchronize accounts on a wide variety of systems. A user might have an AD account, for example, but also an Oracle database account or an SAP account. Although Likewise facilitates consolidation on a single, AD-based, identity many applications still require that users be provisioned in their own user stores. With Oracle, for example, you can tell it that a user will be authenticated, externally, with Kerberos, but you still have to provision the user in Oracle in order to identify the user as such. Finally, IdMS systems can integrate with other HR systems, for example PeopleSoft or similar systems. These features allow an IdMS to be used as the “authoritative” source of account information. When an employee joins or leaves a company, the IdMS can help provision or deprovision the user’s accounts, as necessary.

Although there is some overlap between commercial IdMS systems and our Likewise software (both can be used to accomplish “single username/password” for Windows and non-Windows computers), I think that the combination of products is a powerful combination. By allowing all non-Windows systems to authenticate directly against AD, we eliminate the need to use the IdMS to update large numbers of individual UNIX/Linux/Mac OS X machines. Likewise also adds group policy and single sign-on features that an IdMS does not provide. By using Likewise coupled with an IdMS (instead of manually provisioning users in AD), a company can enforce proper account management processes in AD and can also provision non-AD systems and applications.

“How I Lost 1180 Seconds” or “Putting S.DS Code on a Diet” - Conclusion.

Wednesday, June 18th, 2008

(Note: this is the conclusion of a 4-part post).

The last bit of optimization in my code, again, involved an understanding of data layering. The S.DS DirectoryEntry class is itself a layer on top of a layer on top of another layer. DirectoryEntry is a wrapper around Microsoft ADSI (Active Directory Services Interface). ADSI, in turn, is a wrapper around basic LDAP protocols. Each of these layers is providing services to another while trying to hide the details of its operation. In order to optimize code that relies on S.DS and, hence, on ADSI and LDAP, it is important, however, to understand what these layers do and to have some sense of how they do it.

ADSI is a Microsoft COM interface (available from C++ or script) that presents an object oriented interface to LDAP. Rather than having programmers worry about LDAP binding and searches, ADSI dispenses directory objects when provided with an LDAP path for the object. The objects, in turn, make available LDAP attributes via COM property accessors. The S.DS DirectoryEntry object is a .NET wrapper for an ADSI COM object.

What’s most important to know about DirectoryEntry and ADSI is that they try to delay binding to LDAP objects for as long as possible. When they do, they also provide mechanisms for you to specify the minimum set of attributes that you want to view. Let’s consider how this comes into play in the example that we’ve been working with.

When the code uses DirectorySearcher to search AD, S.DS and ADSI don’t actually bind to found objects. All they really do is to retrieve the LDAP distinguishedNames (paths) of the results. When we get the DirectoryEntry associated with a SearchResult, we have an “unbound” DirectoryEntry (and ADSI) object. Only when we actually retrieve a Properties value from the DirectoryEntry does S.DS actually bind to the necessary LDAP path.

How can we take advantage of this operational knowledge?

Let’s go back to the UI and consider what we actually need to show. I’ve been glossing over the details here - my list view example has only been showing the name of each retrieved object. In the real code, I show the name of the Group and its UNIX gidNumber. Since this is all I need, why not take advantage of this?

Here’s what we do: instead of having the data layer return a full blown GroupInfo object, let’s define a new class:

class GroupInfo0 { public string Name; public int gidNumber; private GUID guidObject; public GroupInfo0(string Name, int gidNumber, GUID guidObject) { this.Name = Name; this.gidNumber = gidNumber; this.guidObject = guidObject; } public GroupInfo GetGroupInfo() { // S.DS code to retrieve an object // via its GUID (details omitted) DirectoryEntry de = new DirectoryEntry(GetLDAPPath(uidObject)); return new GroupInfo(de); } }

What we have is a class that provides only the minimal data needed for our list view and then provides a function to retrieve the full GroupInfo class when we actually need it (for example, when the user chooses the group in the list view and decides to perform an operation on it). How can we make use of this class? Let’s return these objects from our directory search instead of full GroupInfo objects. The advantage of doing this is that we can restrict the attributes that S.DS (and ADSI) need to retrieve from LDAP:

DirectorySearch ds = new DirectorySearcher(…); … set up ds … ds.PropertiesToLoad.Add(”gidNumber”): ds.PropertiesToLoad.Add(”objectGuid”): … SearchResultCollection src = ds.FindAll(); return new SearchResults(src);

By specifically setting the PropertiesToLoad property we inform S.DS that it doesn’t need to “load” all attributes when performing the search. We also need to change our SearchResults class, in order that it returns the simpler GroupInfo0 objects:

class SearchResults { private object[] ao; public SearchResults(SearchResultCollection src) { ao = new object[src.Count]; foreach(SearchResult sr in src) ao[i] = sr.GetDirectoryEntry(); } public GroupInfo0 this[int i] { get { if( ao[i] is DirectoryEntry) { DirectoryEntry de = ao[i] as DirectoryEntry; string sName = de.Name; int gidNumber = (int)de.Properties["gidNumber"].Value; Guid guidObject = de.Guid; GroupInfo0 gi0 = new GroupInfo0(sName, gidNumber, guidObject); ao[i] = gi0; return gi0; } else return ao[i] as GroupInfo0; } } }

My use of the “0″ notation is not arbitrary. There are several examples of API in Windows that return different levels of data. The 0 level is minimal - 1, 2 and other levels provide increasing amounts of information.

Our UI code will now call the data layer to get a SearchResults object. When the virtual mode list view needs to show an item, the UI code will index into SearchResults and will get a GroupInfo0 object that contains enough information to populate the list view. When the user actually selects an item in the list view and needs to operate on, the UI code will call GroupInfo0.GetGroupInfo() to get a “real” GroupInfo object which it can manipulate.

This last set of changes, reducing the amount of work S.DS and LDAP had to perform, had a huge impact on performance. Collectively, all the techniques that I employed (plus another that I haven’t covered) reduced my original task (populating a list view) from 1200 seconds to 20 seconds. Pretty good, I think!

I could have gotten the same performance numbers by throwing away my careful layer separation. I could have had my data layer return a SearchResultCollection with the reduced attribute set and I could have had my UI code pluck the necessary attributes directly from the underlying DirectoryEntry objects. This would have defeated the main objective of n-tier design: isolating layers from changes to other layers. If I later needed an additional attribute, for example, I might change the UI code to retrieve the needed data from DirectoryEntry without realizing that the data layer had only provided a reduced set of attributes for performance reasons. By adhering to a strong discipline of layer separation, it is clear that, if the UI needs another attribute, it needs to be added to GroupInfo0. Since this class is defined in the data layer, it is much clearer where the changes would be necessary in order to provide the requested information.

I don’t guarantee that a similar set of analyses will speed up your own S.DS code. Sometimes, operations are fundamentally slow. It is always best to look for algorithmic improvements instead of clever coding techniques. Nevertheless, my advice is to use S.DS with care and to understand the ADSI and LDAP layers as well as you can.

“How I Lost 1180 Seconds” or “Putting S.DS Code on a Diet” - Part 3

Tuesday, June 17th, 2008

(Note: this is the 3rd of a 4 part post).

To understand where the rest of the 1150 seconds of performance improvement came from, let’s first go back to the concept of n-tier design and understand further layering in the data tier.

Remember that our high level algorithm includes a line that looks like this:

GroupInfo[] agi = DataLayer.GetCellGroupMembers( cell );

What is a GroupInfo? This is a class that I define in my data layer to isolate other tiers from the specifics of how data is stored in AD. The definition of GroupInfo contains an S.DS DirectoryEntry object:

class GroupInfo {     private DirectoryEntry de; … public GroupInfo(DirectoryEntry de) {     this.de = de; } public string DisplayName {     get     {         return de.Properties["displayName"].Value as string;     } } … }

DirectoryEntry is an S.DS object that represents an object (in our case ) in Active Directory. GroupInfo does not expose this object. It contains public properties (e.g. DisplayName) that return desired data without exposing how the data are stored.

This layer of abstraction incurs performance overhead. When we’re retrieving data to display, we create 20,000 objects instead of just 10,000. The search code looks somewhat like this:

DirectorySearch ds = new DirectorySearcher(…); … set up ds … SearchResultCollection src = ds.FindAll(); GroupInfo[] agi = new GroupInfo[ds.Count]; int ig = 0; foreach(SearchResult sr in src) {     DirectoryEntry de = sr.GetDirectoryEntry();     agi[ig++] = new GroupInfo(de); }

First, ds.FindAll() creates a collection with 10,000 search results. We then create a new array with 10,000 elements and iterate through the search results constructing 10,000 GroupInfo objects. Ouch.

How can we avoid all this object creation? Is there anything we can do to exploit the ListView’s support of virtual mode without polluting the 2-tier design? How about if we delay the creation of GroupInfo objects? Consider a class like this:

class SearchResults {     private object[] ao;     public SearchResults(SearchResultCollection src)     {         ao = new object[src.Count];         foreach(SearchResult sr in src)             ao[i] = sr.GetDirectoryEntry();     }     public GroupInfo this[int i]     {         get         {             if( ao[i] is DirectoryEntry)             {                 GroupInfo gi = new GroupInfo(ao[i] as DirectoryEntry);                 ao[i] = gi;                 return gi;             }             else                 return ao[i] as GroupInfo;         }     } }

The SearchResults class is a wrapper for the data returned from DirectorySearcher. When we create a SearchResults object, we pass it the SearchResultsCollection we got back from DirectorySearcher. It creates an array of generic objects and stores the incoming DirectoryEntry objects returned from the search. The SearchResults class can be used like an array because it defines an indexer property. When you request element i from the class, it will look at the object array. If the array contains a DirectoryEntry object (as it will, initially) it will construct a GroupInfo object for it and replace the DirectoryEntry with the new GroupInfo object. If the array already a GroupInfo object, it will return that instead.

SearchResults, in essence, delays the construction of GroupInfo objects until they are needed. Once constructed, they are then cached for later use. These details are hidden from the user-interface code. The UI code uses SearchResults exactly as it previously used its GroupInfo array. We are providing performance improvement without violating layer separation. True, we would not necessarily have implemented SearchResults if virtual mode list views were not available but such impedance matching is being performed with proper layering discipline. By matching the UI’s use of virtual mode list views with class that also delays object construction, we are providing better performance but still isolating layers.

I don’t recall how much speed up this code provided, but it was significant. I am not providing full details on the GroupInfo object. The real version is slow to construct for reasons that are beyond the scope of these posts.

In my last post on this series, I’ll talk about the last category of changes that I made to speed up my S.DS code.

“How I Lost 1180 Seconds” or “Putting S.DS Code on a Diet” - Part 2

Monday, June 16th, 2008

(Note: this is the 2nd of a 4 part post).

In order to explain why I did to speed up my code, let me first describe the general algorithm that I was working with. At the top level, the code looked (vaguely) like this:

GroupInfo[] agi = DataLayer.GetCellGroupMembers( cell ); listView.Items.Clear(); foreach( GroupInfo gi in agi)      listView.Items.Add(gi);

(Note: Consider this pseudo code.)

The general idea is that the UI code calls the data layer code to get an array of items to display and then adds them to the list view so that the user can see them (a list view is the .NET name for a WIndows list box). With >10,000 items, this code would take 20 minutes to execute (on a VM, with a small memory allocation). Why was this operation so slow? There were performance bottlenecks throughout.

At the UI level, adding 10,000 items to a list view is a bad idea. This part of the algorithm alone was taking about 1 minute. Instead of adding all the items to the list box, it’s better to add items only when they’re needed. The .NET Winforms ListView supports this feature — they call it virtual mode. Instead of adding the items to the list view, you put the list view in virtual mode, tell it how many items there are in total and then let the list view ask for items when it needs them. ListView will generate .NET events when it needs items. The code now looks something like this:

// put list view into virtual mode, set up callback listView.VirtualMode = true; listView.RetrieveVirtualItem +=      new RetrieveVirtualItemEventHandler(RetrieveItem); … // agi is an object member variable of type GroupInfo[] agi = DataLayer.GetCellGroupMembers( cell ); listView.Items.Clear(); listView.VirtualListSize = agi.Count; … } private void RetrieveItem( object sender,      RetrieveVirtualItemEventArgs e) {      e.Item = agi[e.ItemIndex]; }

The time it takes to fill the list view is now, essentially, zero. Typically, only a few items are shown on the list box at any time so “filling it” only consists of providing those few items. When the user scrolls or pages, events are fired and you retrieve the needed items.

This simple change poses a very interesting question: Have we really sped anything up? Offhand, you’d say “yes!” — the list view comes up 30 seconds faster. Someone else might argue that, if the user scrolls from start to finish, that it would still take you 30 seconds. Who would be right?

You’d be! First of all, users will rarely need to see all the data. If, on average, they only need to scroll halfway through the list, the average saving will be at least 50%. Second, ListView is actually pretty smart. If you hold down the “Page Down” button, it won’t actually ask you for every single item. If your list is sorted, the user will look at what’s showing up on the list and “converge” on the desired item once close. The net result might be that the list view only ends up asking for 25% of the data. Third, adding 10,000 items to a list view incurs additional overhead costs that are not required in virtual mode.

Regardless of which argument you find most convincing, realize what the change has accomplished: 1) it has amortized a slow operation over a series of user operations (scrolling) and 2) it has changed the perception of performance to be in line with user time. The user perceives scrolling to be as fast as the keyboard repeat rate. It doesn’t matter if the data access is any faster once you match the keyboard repeat rate. Once you’ve matched the user’s maximum perceived UI responsiveness, you don’t need to do any better.

Switching to a virtual mode list box greatly sped up part of the overall operation, but it only shaved 30 seconds out of the 1200. My next two posts will explain where I saved another 1150 seconds.

“How I Lost 1180 Seconds” or “Putting S.DS Code on a Diet” - Part 1

Sunday, June 15th, 2008

(Note: this is the first of a 4 part post).

In previous posts, I raved about .NET’s System.DirectoryServices (S.DS) — its library for accessing LDAP directories. In this post, I’ll go over how I sped up a particular bit of S.DS code in dramatic fashion.

S.DS, like any code, needs to be used and studied for a while before you really become adept at understanding how best to use it. If used in poor fashion, it can result in very slow code.

In my case, the slow code arose from my desire to properly layer various components of our software. Although n-tier design (where, most often, n=3) is associated with database processing the fundamental concept can be used on just about any software. The general idea behind tiered design is to isolate components from the details of other components “above” or “below” them in hierarchy. In database software, n-tier design suggests that all database access (connecting to the database, executing SQL statements) be performed in the data layer and that no other layer be aware of how the data is stored in the database. This allows you to change a database schema, for example, and to only have to change the one layer.

Atop the database layer, a business rules layer is often present. This layer contains the main program logic. In a banking application for example, it’s the layer that knows that a transfer between accounts involves a transacted withdrawal from one account and deposit to another.

Finally, the user interface layer displays information to the user and allows the user to initiate actions.

In my S.DS application, I use a similar layering. Only the data layer knows how Likewise stores information in AD. The data layer exposes higher-level UserInfo, GroupInfo and CellInfo objects that are manipulated by the business-rules layer. Finally, the user-interface layer is the code that interacts with our MMC snapins or with our own Likewise Console.

This layering initially resulted in some very slow code during a very common use case. When the user brings up the Likewise Settings property page for an AD organizational unit (OU) — in Microsoft’s Active Directory Users and Computers (ADUC) snapin — we display a list of all of the users and groups that are “enabled” for Unix access by the computers joined to that OU. At the lowest level, this operation starts with an AD search performed by the S.DS DirectorySearcher class. At the UI level, the data is displayed in a ListView.

When we encountered an OU with a large number of users and groups (around 10,000), my original implementation of this code took 20 minutes (1200 seconds) to display the information. By the time I finished redesigning and optimizing, I had this time down to less than 20 seconds.

I’ll explain how I accomplished this in my next post.