Archive for June, 2008

Identity Management Systems

Friday, June 27th, 2008

I’ve spent the past couple of days looking at IBM Tivoli Identity Manager (ITIM). One of our customers uses this product and wants us to be able to work with it. It’s pretty cool, but somewhat painful to get running. It’s a Web-based application so, naturally, it’s built on top of IBM Websphere. It needs a database where it can store authoritative identity information so, naturally, it needs IBM DB2. There’s the actual code itself, of course. Then there’s the “Directory Integrator” which can interface with other directory systems. Then there are “adapters” – I was using the Active Directory Adapter. It runs as a service, communicating with ITIM over http, ideally, over https (SSL). If you want to do the latter, you’ll need to install a certificate authority so that you can generate certificates for ITIM and the adapter. I used the “Rapid Install” option and it was pretty good, but only after I gave up trying to install on anything other than drive C (in Windows).

WIth all these components, I was pleasantly surprised that everything pretty much worked as expected. I’ve become accustomed to large systems being inherently flaky. ITIM was solid.

It was also pretty easy to modify ITIM. I took the AD adapter and was, relatively quickly, able to extend it to support the additional attributes that we use in AD. I also modified some forms to support input/modification of these attributes. It only took me a couple of iterations to get right (mostly, due to my own bad typing but also due to some unexpected changes in letter case). We can now use ITIM to provision and maintain accounts in AD that are usable by UNIX, Linux and Mac OS X machines outfitted with our Likewise Enterprise agent.

ITIM and other Identity Management Systems (IdMS) are a good idea for any company with a large number of employees that need computer accounts on many different systems. Although our software allows non-Windows systems to directly authenticate against Active Directory and, thus, eliminates the need to use an IdMS to provision UNIX, Linux and Mac OS X machines, an IdMS can still provide value to organizations that use Likewise software. First, an IdMS provides an established workflow for provisioning new user accounts. This workflow can include approval processes for any granting of extended privileges. Second, an IdMS typically can synchronize accounts on a wide variety of systems. A user might have an AD account, for example, but also an Oracle database account or an SAP account. Although Likewise facilitates consolidation on a single, AD-based, identity many applications still require that users be provisioned in their own user stores. With Oracle, for example, you can tell it that a user will be authenticated, externally, with Kerberos, but you still have to provision the user in Oracle in order to identify the user as such. Finally, IdMS systems can integrate with other HR systems, for example PeopleSoft or similar systems. These features allow an IdMS to be used as the “authoritative” source of account information. When an employee joins or leaves a company, the IdMS can help provision or deprovision the user’s accounts, as necessary.

Although there is some overlap between commercial IdMS systems and our Likewise software (both can be used to accomplish “single username/password” for Windows and non-Windows computers), I think that the combination of products is a powerful combination. By allowing all non-Windows systems to authenticate directly against AD, we eliminate the need to use the IdMS to update large numbers of individual UNIX/Linux/Mac OS X machines. Likewise also adds group policy and single sign-on features that an IdMS does not provide. By using Likewise coupled with an IdMS (instead of manually provisioning users in AD), a company can enforce proper account management processes in AD and can also provision non-AD systems and applications.

Crusty Programmer Story 1

Thursday, June 26th, 2008

If you read my About page, you’ll know that I’m a crusty old programmer (COP). I may be 48 years young (as of January 2008), but I’m 144 years old in programmer-years. A year of intense programming ages you at 3x normal. (I suppose that, more accurately, I’m 15+33*3=114, since I didn’t start writing software until I was 15 🙂 ).

Needless to say, over the course of this many years I’ve see a lot of stuff happen.

As do other COPs, I like to amaze people with “back in the old day” stories. When a bunch of us COPs get together, this becomes a competition of sorts where winning is achieved by describing the most absolutely arcane, unbelievable, thing that we had to deal with in our youth. I generally do pretty well in these contests.

No, I did not cut gears for Babbage’s Difference Engine nor did I swap tubes on the Eniac. My history only goes back to Univac 1108s and IBM 360s in the mid 70s. I did punch card decks and stream paper tape. I do remember going to libraries and seeing books on both digital and analog computers (whatever happened to these?). I also remember reading about wiring “programming panels” on older computers.

Most of my crusty programmer stories come from the early days of microcomputers (as they used to be called back then). My first micro was an Intel 8008-based system. The machine was called the Scelbi 8H. I would love to buy one of these now, but they all seem to be in museums.

I experienced the arrival of CP/M and the Z-100 based Intel 8080 systems. I used the first Altairs and envied the folks with the cool IMSAI machines (much prettier front panels). I programmed these as well as the first Apple II (using UCSD Pascal).

One a scale of 1-10, programming a computer via its front-panel switches is worth about 7 crusty old programmer points. Whenever I meet a whippersnapper that brags about working on a PDP-8, I counter that they were totally spoiled. Let me explain.

On the PDP-8 (and all computers of which I’m aware of, save one), the front panel had lots of switches:

(borrowed, without permission, from Wikipedia)

To write data to memory, you could set the main bank of switches to an address (only 12-bits) and press the “ADDR LOAD” button. Then you could set the switches to the data you wanted to write and press the “DEP” button. This would write the data in the previously loaded address and would then increment the address by one. You could write successive words of memory by setting the switches to the desired value and pressing “DEP”. The PDP-8 front panel worked by performing a “memory write” bus cycle.

Performing this task was way more complicated on the Scelbi 8H. Consider it’s front panel:

Scelbi 8H front panel

(also from Wikipedia)

It’s only got 8 data switches and three pushbuttons. The three buttons are labeled “Int”, “Run” and “Step.”  The most important of these was Step. Run simply started your program and Int stopped (interrupted it). Step was where all the programming action took place. On the Scelbi 8H, this button “jammed” an instruction (whatever was on the switches) into the CPU. I don’t know whether this was dictated by the CPU design or simply an abomination invented by Nat Wadsworth (the father of the Scelbi). Presumably, implementing the front panel this way was easier than having it perform a full memory write cycle. However, it made panel-based programming torturous. This is what you had to do:

  1. Set the data switches for the binary equivalent of the LHI instruction (octal 056). LHI is the “Load H register immediate” instruction. In the 8008, The H and L registers are the “address” to be used for subsequent memory operations. Press Step. Now the CPU is waiting for the second byte of the LHI instruction: the byte to be stored into the H register.
  2. Set the switches to the high byte of the memory address to where you want to store a byte. Press Step. Now you’ve got the H register loaded!
  3. Set the switches for the binary equivalent of the LLI instruction (066). Press Step.
  4. Set the switches to the low byte of the memory address to where you want to store a byte. Press Step. Now you’ve got your HL register pair set!
  5. Set the switches for the binary equivalent of the LMI instruction (076). This is the “Load memory immediate instruction”. Press Step. Now the CPU is waiting for your data byte.
  6. Set the switches to the byte value that you want to store to memory. Press Step.  You’re done!

If you wanted to store a second byte, things were a little easier. You could simply execute the INL (increment L) instruction and then repeat steps 5 and 6. I don’t remember if INL automatically “carried” to the H register. I suspect you had to know you were crossing a “page” boundary and had to manually increment H, too.

Yes, we were real men back in 1975. Programming was painful and we liked it that way. No pansy-ass compilers for us, no sir. Of course, we were lucky if we could write 128 bytes of machine code a day. Today, heck, an empty program generates 1000 times that much code.

PS: A big “shout out” to Ron Hosek and Jeff Augenstein. These two guys hired me back in 1975, when I was just 15 years old. They let me write software for the Scelbi 8H and, later, for the Altair. They were truly among the first of the computer entrepreneurs and I learned a lot working with them. I’ve lost touch with them over the years, but I hope that they are happy, healthy and smug for having so much foresight.

Best Practices: Logging

Wednesday, June 25th, 2008

I have written about system logging once or twice already. Earlier this month, I contrasted Windows and Unix event logs. A couple of posts ago, I mentioned logging as a “best practice” that’s frequently done poorly. In this post, I describe what I believe are currently accepted best practices for log collection, retention and analysis.

Log collection is not trivial. If done properly, the process should be:

  1. Useful – you should avoid logging unnecessary information
  2. Accurate – you should never lose log data due to network failures or traffic
  3. Secure – logs should be protected from malicious modification; log-related network traffic should be encrypted
  4. Automatic – manual operation will always be a weak link in any system

Log collection systems typically involve two or more levels of log collection and storage. First, each endpoint needs to perform useful, accurate, logging. On Windows systems, you should set your audit ACLs carefully to avoid unnecessary log events. You should assure that you have sufficient disk space for logging and that your event logs are configured to be large enough and to never overwrite themselves. On UNIX and kin, you need to look at your syslog configuration and see what you can do with it. All versions of syslog let you send different log “facilities” and different priority levels to different locations. You should configure syslog to direct the kern and auth facilities to a different location than the noisier, less important facilities (mail, lpr, news, etc.). You’ll have to consider how important user, daemon and others are to you. For some servers, especially those running mission critical applications that employ the user facility, you may want to consider that facility important as well and direct it to the same location as kern and auth. Next, you need to consider the priority levels. Usually, anything with priority lower than warning does not need to be retained.  The objective with syslog is to identify the information that you want to retain and separate it from that which you don’t. I usually end up with three log files: one for important, retained, information (kern and auth output), one for less important information from kern and auth and one for all other information. The latter two categories aren’t retained centrally; they’re logged locally and managed with a log rotation mechanism. On most systems, log rotation can be performed by some versions of syslog or with a separate logrotate package.

The endpoint is the first level of log collection. The next levels typically take endpoint data and put it in a centralized database. In large installations, this may involve intermediate servers taking data from sets of endpoints and then writing the data to a single centralized database cluster.

There are various products that support centralized log collection. NetPro’s LogAdmin product offers both Windows and UNIX support. You can also use Microsoft’s System Center Operations Manager (MOM). For pure UNIX installations, some companies choose to make use of syslog’s server functionality. syslog, in addition to managing local logs, can also act as a server, gathering information from multiple machines. This approach, unfortunately, can suffer from several limitations. Standard syslog, for example, only supports unencrypted UDP-based communication. This network protocol is subject to both data loss and to poor security. Other versions of syslog such as syslog-ng support the TCP protocol which, coupled with stunnel, can provide encrypted, reliable communication.

Regardless of what collection scheme you use, you should also implement good retention policies. Log data tends to generate very large amounts of information. Understanding what you do and don’t need to have handy is important. Best practices suggest that you should maintain, in readily accessible form, the last 12 months of data. The highlights in the last sentence usually translate to “fully searchable, uncompressed data”. Information older than 12 months should still be stored, but it can be compressed to minimize disk space. In practice, what this means is that log retention systems typically store 12 months of data in a relational database but then compress old data and remove it from the database.

Note that old data still needs to be retained and that it might need to be searched after the 12 month lifetime. Log retention systems should allow old data to be reloaded in the database to facilitate processing.

The final “best practice” for logging is to implement log file analysis. There are several commercial products that perform sophisticated correlation tests and root-cause determination tests on log files. Again, Microsoft MOM can help here. Another choice would be Splunk. Splunk is interesting in that it focuses on searching instead of ambitious analysis.

Implementing an effective logging system seems like it should be straightforward but the reality is that it’s not. The standard tools built into operating systems are not sufficient to implement all stages of the logging process. Third-party tools are frequently needed in order to implement log retention and analysis.