Archive for the ‘Technology’ Category

The Ideal IT Resume

Thursday, July 3rd, 2008

We’ve been doing a lot of interviewing lately, looking for developers, QA folk and deployment engineers. We’ve looked at hundreds (if not thousands) of resumes, performed numerous phone and live interviews but made only a handful of offers. It’s been difficult to find people with the skills that we’re looking for.

Likewise Software is in the identity management business. We make software that allows non-Windows systems to authenticate against Microsoft Active Directory and to employ AD-based group policy. As such, our needs at probably more sophisticated than those of most companies.

First, we need someone with good Windows networking/AD/DNS skills. Our biggest challenge at customer sites is assuring that their directories are properly configured. Our employees (especially our deployment engineers) need to be familiar with Active Directory and its architecture. They need to be able to run Likewise and Microsoft tools to assure that AD is properly configured and working properly. They need to be comfortable using tools like ADSIEDIT to look at objects in AD and they need to know what LDAP is. Experience with DNS and UNIX Bind is also valuable. Customers who choose to use Bind have to properly configure it to forward to AD/DNS or they have to manually set up a series of service records. Familiarity with NSLOOKUP and other tools is valuable.

Because random things can always go wrong when using a network, familiarity with network analyzers such as Ethereal is also valuable.

Second, we need someone with good Windows administrative skills. They have know how users are created in AD, how access controls are applied to resources and how Group Policy is used to help manage systems. They have to have some sense of how organizational units are used in AD and how GP objects can be inherited to accomplish company and departmental security and management goals.

Third, we need experience with UNIX and Linux administration. We support numerous versions of each so having familiarity with different shells and editors is a plus. You can’t rely on bash or vi being available on every system.  Different versions of UNIX and Linux also have their own vagaries regarding where they store certain files and how they start/stop daemons. Having rudimentary knowledge of different places to look/techniques to use is important. You might be working on HPUX one minute and on Ubuntu the next. Some knowledge of how local accounts are stored in /etc/passwd and /etc/shadow is a must.

Fourth, we need someone with rudimentary knowledge of UNIX/Linux architecture. Knowledge of PAM and NSSWITCH is valuable. Understanding how name resolution works and how networks and firewalls are configured is valuable, too.

Fifth, some cursory programming skills are useful. We frequently need to write or modify shell scripts to help with deployments or testing/monitoring tools. Our account migration tools can generate scripts and being able to modify those is also valuable.

Sixth, some Mac knowledge comes in handy as does some experience with Linux Gnome desktops.

Seventh,  general knowledge of Kerberos, Kerberos-based SSO and Kerberized applications is useful.

Finally, some experience with third party identity management systems is useful since we often need to interface with IBM ITIM or Sun Identity Manager or Microsoft ILM.

If you know anyone that meets all of these qualifications let me know. I’m pretty sure that I’ve hired all four of them and that my competitors have the other the other four. 🙂

Of course, we don’t expect candidates to have all of these skills. We’re lucky if they have half of them. My observation, however, is that our needs, if you factor out a couple of domain-specific things (e.g. Kerberos and LDAP), are not far from what any modern IT department needs if they’re running a heterogeneous data center. The amount of information that you need to know to effectively manage both Windows and non-Windows computers is huge. It’s not surprising that many departments choose to segregate these duties and assign them to different teams. As an unfortunate consequence, however, there is often little interaction and, sometimes, open hostility between these teams. Introducing interoperability solutions is complicated by the inherent distrust between the two camps. IT departments would do well to encourage education and personnel movement between the teams as a way to cross-pollinate ideas.

More Thoughts on Cloud Computing

Wednesday, July 2nd, 2008

In my last post I bemoaned that current efforts around cloud computing are pretty primitive. I closed with a comment that I had some thoughts about what a true cloud computing platform should look like. This post goes into that topic, albeit yet at a high-level.

I think that a cloud operating system (let’s call it a COS) first needs to provide a programming model that encompasses both local and distributed computing. For a long time now we’ve divided software into various categories that each had some fundamentally different architectures:

  • Local, client-based, computing
  • Remote, server-based computing with a thin front-end (namely, Web applications accessed via browsers)
  • Client-server applications that couple thick, client-based, software with remote, server-based computing services

The first category is dominated by Windows-based software, typically written in C++ (perhaps, Microsoft MFC based) or in C#/.NET. The second category is populated by a variety of different web application platforms –  Jave J2EE platforms, PHP, ASP.NET and others – on the server and using Internet Explorer, Firefox and Safari on the client. The final category is represented by the fewest number of applications. The client software is frequently written in C++ or C#/.NET but the server software can be implemented through SOAP web services, though custom communication protocols or through domain-specific protocols such as those used by Microsoft Exchange (talking to Outlook clients) or by databases (SQL Server, Oracle, others).

It would be valuable for a COS to first start out by unifying these concepts. If the difference between writing a local application and a client-server application is minimal, more developers will be able to accomplish the latter. The COS might start by suggesting (and providing services to this end) that applications be written by first separating their UI from their computational elements. This concept exists, to some extent, in the MVC (model-view-controller) design paradigm and in n-tier design. It’s applicable even to local applications. It might then also suggest that communication between the UI and the computational elements be achieved through particular means. I would suggest SOAP over HTTP, but that could be simply be one of several mechanisms/transport layers provided by the COS. Applications would define the interface to the computational layer and the local/remote location of the UI and computational layers would dictate the choice of transports. With most SOAP toolkits this should be easy to accomplish given that they already generate stub functions to hide the transport details. I would also suggest that database access be hidden inside the computational layer rather than directly performed by the UI components (this, too, is a tenet of n-tier design).

A COS should also provide a programming model that exploits the benefits of Web based application delivery and thin clients. Ultimately, an Ajax based Web application is a fat client-side application that happens to be delivered by an HTTP server and talks to its computational elements over Web services – a model very similar to what I’ve described.

There’s not much that’s magical about HTTP-based application delivery. The HTML files could just as easily already be present on the client computer. URIs can be FILE:// based instead of HTTP:// based. The only additional piece that a Web-delivered application provides is the URL of the site that provides the back-end computational services. This is inferred from the URL that is used to load the application, but that is purely circumstantial. I would be possible to load an Ajax application locally (from a local HTML file) and provide it with a URI for the server application with which it should communicate. Note, too, that the nature of this URI dictates the security context under which the application is run. Local applications might have access to files and other local resources whereas applications loaded from untrusted Web sites might run under very stringent conditions.

With these mechanics taken care of, the next thing that a COS needs to consider is the nature of its communication protocols. SOAP over HTTP  has significant limitations as a general-purpose remote procedure call (RPC) mechanism.

One of these limitations is its single-duplex nature. Clients call servers, but servers don’t typically call clients. Server-to-client communication needs to be implemented through some form of subscribe/poll model. The COS should provide services to facilitate this. A second problem with SOAP-based Web services is its reliance on HTTP as a transport protocol. HTTP is stateless.

Web applications struggle with statelessness, but this characteristic can also provide a valuable benefit. The drawback of statelessness is that it places extra burdens on the UI and computational components of COS applications. Today, Web based applications rely on browser cookies or other hacks to provide some handle that can be used by the server components to establish/reestablish context for the particular UI session. Note that using the term session suggests the nature of the problem. A user has a concept of a session. S/he puts an item into a shopping cart and expects the shopping cart state to persist between page invocations. The server code, however, wants to be stateless. Only by providing a browser cookie or some other session handle can the concept of a session be established. The server code (usually, as part of the J2EE middleware or the .NET framework) is provided with session state that’s based on the cookie/handle. If subsequent HTTP requests are handled by different servers (due to load-balancing, for example), the server side application framework has to provide some form of session state that exists across machines. This is usually implemented by persisting session state in a database or by using some type of messaging mechanism across servers.

The discussion above also suggests the benefits of statelessness. Because HTTP requests (and, thus, SOAP based Web services) are stateless, server code can be more easily extended to run on multiple machines in the computing cloud. A COS’ computational elements could be run on multiple servers and UI requests could be directed to any one of them.  Like current Web servers, the COS would have to provide a formal mechanism for providing session state across computers.

If the COS makes no distinction between local and distributed applications, local applications will also need to adhere to restrictions imposed by the nature of communications between the UI and computational elements.  In general, communication between the two will have to use interfaces amenable to message-based communication. Function arguments would use pass by value semantics and there would be restrictions on the types objects that could be passed between elements. Developers who are using Web services today will already be familiar with these restrictions.

In the case of SETI@home and Folding@home -type applications, a COS should also support apps that consist mostly of computational elements. Today’s Web servers, for the most part, are active only when awakened by requests from browses. A COS should support the concept of a headless application that consists of only computational elements.

In subsequent posts, I’ll write more about my thoughts for accomplishing this as well as my thoughts on storage and how to achieve a self-organizing COS.

Is Cloud Computing Vaporous?

Tuesday, July 1st, 2008

I’ve been reading a lot of references to the coming of age of cloud computing. The more I read the more I am disappointed. In many cases, for example Amazon’s EC2, cloud computing seems like marketing-speak for “an easy way to rent virtual machines.” Amazon gives you some Web services that let you on-the-fly, allocate virtual machines. Cool, but not exactly rocket science. Hadoop, or more specifically, Google’s MapReduce programming model, are more along the line of what I’d call cloud computing, but only for a narrow class of programming problems.

The general idea with cloud computing is to be able to use a large network of computers to implement programs with large computing, storage or other resource requirements. As the needs of these programs change, the cloud should be able to easily adapt by adding additional resources on-the-fly.

The objectives of cloud computing are not that much different from the 1980’s objectives of parallel or array processing computers or from the 1990’s objectives of load balancing web applications. The restrictions that we see on would-be cloud computing “solutions” are often just repeating the restrictions of earlier technologies. Parallel computers were great at problems that could be easily parallelized. Unfortunately, there seems to be only one such problem: numerical simulation of fluid dynamics. Okay, I’m exaggerating, but certainly, parallelization (especially, automatic parallelization) has not proven applicable to a wide variety of different domains. Load balancing, too, has proven to be harder than it seems. True, it’s easy enough to deploy extra Web servers to handle HTTP requests, but session-state and database issues have proven more difficult. I would venture to guess that 95% of Internet web applications are still dependent on a single, working, database cluster and/or networked storage array.

A couple of months ago, I got to sit in on a talk by Paul Maritz. I knew Paul at Microsoft and had talked to some of his developers at Pi Corporation. From Paul’s talk, it seems like Pi has done some interesting work with regard to distributed data. They’ve done some clever things, allowing data to exist in multiple places, while also providing local caches available off-line. Paul is a smart guy and I’m sure he’s part of the reason why Pi was acquired by EMC. Nevertheless, Pi seems to have focused primarily on storage and not on general cloud computing issues.

Microsoft, of course, is making noise about cloud computing. This is not surprising considering their late arrival to the party. Something is supposed to be announced by the end of the year but even Ray Ozzie seems to be underpromising what it will be.

What would I like to see in a cloud computing architecture? I think it needs to accomplish several things:

  1. Address storage, computation and bandwith.
  2. It can restrict itself to specific application domains, but it has to be more general purpose than MapReduce. It should certainly cover: HTTP/XML Web service applications and computationally intensive problems such as SETI@home and the Folding@home projects.
  3. It needs to be adaptive to changing needs and available resources.
  4. It should eliminate single points of failure.

A final thing that I’d like to see that’s not strictly a requirement is that the system be self organizing and not based on centralized control. I want BitTorrent, not Napster.

I have some ideas about how I’d personally design such a system, but nothing worth discussing just yet.