Archive for the ‘Interoperability’ Category

Identity Management Systems

Friday, June 27th, 2008

I’ve spent the past couple of days looking at IBM Tivoli Identity Manager (ITIM). One of our customers uses this product and wants us to be able to work with it. It’s pretty cool, but somewhat painful to get running. It’s a Web-based application so, naturally, it’s built on top of IBM Websphere. It needs a database where it can store authoritative identity information so, naturally, it needs IBM DB2. There’s the actual code itself, of course. Then there’s the “Directory Integrator” which can interface with other directory systems. Then there are “adapters” – I was using the Active Directory Adapter. It runs as a service, communicating with ITIM over http, ideally, over https (SSL). If you want to do the latter, you’ll need to install a certificate authority so that you can generate certificates for ITIM and the adapter. I used the “Rapid Install” option and it was pretty good, but only after I gave up trying to install on anything other than drive C (in Windows).

WIth all these components, I was pleasantly surprised that everything pretty much worked as expected. I’ve become accustomed to large systems being inherently flaky. ITIM was solid.

It was also pretty easy to modify ITIM. I took the AD adapter and was, relatively quickly, able to extend it to support the additional attributes that we use in AD. I also modified some forms to support input/modification of these attributes. It only took me a couple of iterations to get right (mostly, due to my own bad typing but also due to some unexpected changes in letter case). We can now use ITIM to provision and maintain accounts in AD that are usable by UNIX, Linux and Mac OS X machines outfitted with our Likewise Enterprise agent.

ITIM and other Identity Management Systems (IdMS) are a good idea for any company with a large number of employees that need computer accounts on many different systems. Although our software allows non-Windows systems to directly authenticate against Active Directory and, thus, eliminates the need to use an IdMS to provision UNIX, Linux and Mac OS X machines, an IdMS can still provide value to organizations that use Likewise software. First, an IdMS provides an established workflow for provisioning new user accounts. This workflow can include approval processes for any granting of extended privileges. Second, an IdMS typically can synchronize accounts on a wide variety of systems. A user might have an AD account, for example, but also an Oracle database account or an SAP account. Although Likewise facilitates consolidation on a single, AD-based, identity many applications still require that users be provisioned in their own user stores. With Oracle, for example, you can tell it that a user will be authenticated, externally, with Kerberos, but you still have to provision the user in Oracle in order to identify the user as such. Finally, IdMS systems can integrate with other HR systems, for example PeopleSoft or similar systems. These features allow an IdMS to be used as the “authoritative” source of account information. When an employee joins or leaves a company, the IdMS can help provision or deprovision the user’s accounts, as necessary.

Although there is some overlap between commercial IdMS systems and our Likewise software (both can be used to accomplish “single username/password” for Windows and non-Windows computers), I think that the combination of products is a powerful combination. By allowing all non-Windows systems to authenticate directly against AD, we eliminate the need to use the IdMS to update large numbers of individual UNIX/Linux/Mac OS X machines. Likewise also adds group policy and single sign-on features that an IdMS does not provide. By using Likewise coupled with an IdMS (instead of manually provisioning users in AD), a company can enforce proper account management processes in AD and can also provision non-AD systems and applications.

Monitoring: What You Don't Know Can Hurt You

Monday, June 23rd, 2008

In my last post, I mentioned network and application monitoring as one of those best practices that’s unfortunately not practiced as often as it should be. The importance of monitoring systems cannot be overstated. You want to know that your computers are functioning as you expect them to and that the applications running on them are also functional. Note that these two are only slightly related and correlated. True, if a computer has crashed, the applications running on it have also crashed. On the other hand, just because your hardware and operating system are running doesn’t mean that your applications are. This is the essential difference between network and application monitoring. I’ll come back to this point later.

If monitoring is so important, why doesn’t everybody do it? Well, in a sense they do, but the poorest practice is to rely on human monitoring (i.e. waiting for your customers to tell you your computers are down). Why doesn’t everyone implement automated monitoring systems? To consider the answer to this question, let’s review how these systems work.

There are various ways of classifying monitoring systems. One way to classify them that’s relevant to this discussion is based on whether the system is agent-based or agent-less.

In an agent-based system, special monitoring software is present on every computer and network device that is to be monitored. This monitoring agent evaluates the health of the computer/device and signals to the central monitoring software when something is out of kilter. Monitoring agents can sometimes also be queried by the central monitoring console in order to provide operating metrics, for example, performance data or resource availability data. Because it’s the agent that detects anomalies and informs the monitoring console, these systems can also be considered push type systems; the agent pushes the data to the console.

Agent-less systems do not require any special monitoring software on the computers and devices that are being monitored. Instead, the monitoring software uses pull mechanisms to evaluate the health of a monitored entity. These mechanisms might consists of low-level network probes, for example, pinging a device or higher level probes such as a specific HTTP request or an RPC call.

Agent-less systems are easier to implement, but agent-based systems are inherently more capable of evaluating system health as they have all operating system services at their disposal rather than just the ones accessible through external network means.

As a personal opinion, I also posit that agent-based systems are superior at hardware and OS monitoring whereas agent-less systems are ideal for application level monitoring. The former is typically more concerned about hardware and system services whereas the latter is concerned solely about whether applications are functional or not. How best to evaluate applications? Simulate their their use and evaluate the quality of their responses. Say you are monitoring a banking application. What better way to determine whether the application is running properly or not than by simulating a user, bringing up the bank web site, performing a transaction and checking your balances. Remember to use dummy accounts set up for this purpose.

There are some decent agent-less monitoring systems. Nagios, for example, supports numerous network probes that can be used in clever ways. Writing new probes is relatively easy, too. Nagios, by the way, can support both agent-based and agent-less monitoring. SiteScope, formerly from Mercury, now from HP, is also pretty cool.

As to agent-based monitoring, the pickins are much slimmer. The simplest agent-based systems are, naturally, based on the simple network monitoring protocol (SNMP). SNMP allows devices to “publish” a set of data that can be queried and displayed by monitoring consoles. Device manufacturers (SNMP is most heavily used by routers and other network gizmos) design a tree like structure of data called a management information base (a MIB). At each node in the tree is some datum that describes the operational health of the device. The manufacturer gets a magic number assigned to the company and each node in the MIB is identified by the company OID and a dotted sequence that describes the node’s position in the tree. SNMP-aware monitoring software, once informed of the device’s MIB, can query the device (using the SNMP protocol) to retrieve values for the various data nodes. SNMP also allows management software to write to SNMP addresses in order to configure devices. Finally, devices, having detected anomalies, can raise SNMP traps that can be “caught” by monitoring software.

The main drawback with SNMP is that it has a very poor security model. SNMPv3 (the latest incarnation) tries to address the security issue, but few devices support the new version. Without good security, SNMPv2 allows non-authorized users to view the operational status of a monitored device and to, perhaps, gain information that can be used to compromise it. Note too that devices that support configuration via SNMPv2 are vulnerable to being maliciously configured by non-authorized users.

While SNMP is frequently implemented in network hardware, it is also occasionally implemented in UNIX and UNIX-like computers and very occasionally on Windows machines.

Naturally, Windows computers are typically monitored using a different technique. Three of them, in fact. Sigh.

First, Windows computers support RPC. An administrator can tell if a Windows computer is healthy by connecting to it with a remote management console and looking at various data. The perfmon program, for example, can display graphs of Windows performance counters that measure available disk space, RAM and hundreds of other data.

Second, Windows computers support the Windows Management Instrumentation (WMI) protocol. WMI is a crude object oriented mechanism that allows Windows monitoring and management software to query system metrics, set system parameters and invoke management functions. WMI, by the way, is based on an IMTF standard known to the rest of the world as CIM or WBEM. Forget about the “standards” part – Microsoft WMI is not interoperable with anyone else’s implementation. The Microsoft Systems Center Operations Manager (MOM) folk had to implement their own WBEM code for Linux/UNIX in order to monitor these systems. The mechanism they implemented is actually the third monitoring technique that’s available on Windows, WS-Management or WS-Man as its frequently referred to.

WS-Man, like all of the WS-* protocols, is based on SOAP. A WS-Man aware monitoring program can read performance metrics and write configuration values by performing XML-based SOAP calls to a monitored device.

Although WS-Man seems like A Good Thing, especially since Microsoft is providing it on non-Windows platforms, I think it has several key flaws. First, WS-Man is based on both SOAP and WMI/CIM/WBEM. SOAP requires a considerable bit of glue in order to implement. In Windows, C# and .NET makes it pretty easy. On Unix, you can do it in C++ using Axis for example, you can do it in Java using Sun JWSDP or you can do it in Perl/Python or other SOAP aware scripting language. Each of these has its flaws. The C++ approach is error prone. The .NET or Java approaches require a huge runtime memory footprint. The Perl/Python approach is typeless requiring manual development of SOAP WSDL files instead of reflection-based synthesis. Beyond the SOAP issues, WMI/CIM/WBEM is simply butt-ugly (maybe even fugly). The technology had the misfortune of being designed at a time before Java and C# came into fruition. As a result, it’s extension mechanism is just clunky.

Beyond SNMP, RPC, WMI and WS-Man, there are yet other solutions. Companies that make monitoring software (for example, Microsoft, IBM, HP, BMC, Computer Associates, and others) frequently have their own proprietary monitoring agents that use yet other protocols.

Given all of these unattractive alternatives, it is not suprising that companies don’t diligently monitor all of their systems. The ones who do this best usually end up using a mashup of various mechanisms: SNMP for network hardware, Systems Center/MOM for their Windows systems, some Nagios for agent-less monitoring, toss in some HP OpenView in one or two divisions and some home grown stuff elsewhere.

What would Alan Turing do? Ack. I suppose WS-Man is better than the alternatives but I just can’t imagine Cisco adding all the necessary software to implement it.

Why (OS) Architecture Matters

Friday, June 20th, 2008

In previous posts (for example, What Linux Needs to Learn From Microsoft) I’ve complained about the lack of management APIs in Linux and UNIX. Microsoft Windows has a comprehensive set of management APIs for a variety of OS features:

  • Networking
  • File/printer sharing
  • Event logs
  • Registry I/O
  • Service control
  • etc.

Moreover, each of these APIs typically works locally (on the machine the call is made) or remotely (on another machine). How is it that such a comprehensive set of APIs came to be and that these API should so consistently support remote management? The answer to both of these questions is by design.

Let’s understand what this means by going back in time to the late 80’s and early 90’s. I had just arrived at Microsoft back then (from HP). I started working on the Windows 1.04 SDK and then later moved to the OS/2 project.

In the MS DOS (pre-Windows) days, the concept of an API, let alone a network-aware one, was very crude. MS DOS programs accessed the operating system by performing “software interrupts”. An application would set up the x86 registers and would then perform an “Int 21”. The OS would field the interrupt and interpret the registers to determine what to do.

Programming language libraries, for example, the “C run-time library”, added the first primitive API to MS DOS. Instead of performing an Int 21, you could call “open()” or “fopen()” to open a file.

With Windows 1.0, the number of APIs greatly increased. if I recall, Windows added about 300 API including its “kernel”, windowing and graphics features. This number would eventually grow to 3000 in the Windows NT days.

When OS/2 arrived, something else had arrived with it: the local-area network. Networks had been available for some time (Novell Netware, Windows-for-Workgroups, MS DOS LANMAN). OS/2, however, introduced the concept of an operating system that was fundamentally network aware and could provide network services. Unlike Netware, OS/2 was a general-purpose operating system that could behave as a centralized server, too. OS/2 enabled the development of client-server applications.

The presence of networks and of network services necessitated the development of remote management APIs. For example, when a user typed:

net view \\fileserve1

The net utility had to be able to “talk” to fileserve1 and “ask it” what resources it was sharing. At the API level, the operating system needed to provide a NetShareEnum function that included a servername parameter indicating what server was being queried.

How did OS/2 “talk” to servers? Did it have some special protocol talking on some dedicated port just for this purpose? No. OS/2 was built on top of a basic RPC (remote procedure call) protocol that worked “on top” of named-pipes.

Now, I can’t provide all the details and history about how RPC progressed from OS/2 to today. My colleague at Likewise Software, Krishna Ganugapati, however, can. I have a link to his blog on my blogroll.

In Windows NT (later, Windows 2000, Windows XP and now Windows Server and Vista), the RPC mechanism became very formalized. Just about every OS API was written atop RPC. You would define your API using the IDL (interface definition language) and compile it (using the MIDL compiler). The compiler would generate a series of client and server-side stubs that would completely hide the details of RPC. You didn’t have to worry about marshaling arguments or about handling the transport layer. The RPC run-time would take care of communications over TCP/IP or over named pipes (maybe even netbios, too).

Local calls would skip the networking layers but the RPC mechanism would further server to marshal between user and kernel level code.

The key lesson here is that Windows NT was built from the ground up with an RPC, distributed system architecture. UNIX is not (although Sun was an early pioneer in RPC technology and still a key contributor). Linux is not. Neither is Mac OS X/Free BSD.

Admittedly, the RPC libraries might be available for these platforms (altbeit in a neglected state; we’ve made several fixes/improvements to them). The operating systems themselves are not designed from a distributed perspective.

Thus, it’s not suprising that management APIs, especially remote management APIs do not exist for these platforms. The respective vendors are all working on WS-Management or WBEM or some other protocols to help with the situation but they are still, fundamentally, in a weaker position than the Windows NT progeny.

At Likewise, we develop interoperability software that allows UNIX, Linux and Mac OS X computers to work well on Windows networks. In order to provide this software, we have to work with services on Windows that expect to be invoked over RPC.  Although, at first, we “hand-rolled”, these RPCs our most recent software is based on a full DCE RPC implementation. This has greatly facilitated our work and is allowing us to provider further interoperability features between Windows and non-Windows systems. Sometime soon, we will be releasing this software with both our open source and proprietary software products.