Archive for June, 2008

Virtual Directories

Saturday, June 21st, 2008

I’ve talked to a couple of companies now that sell virtual directory products.  Most recently, I talked to Identyx, a company recently bought by Red Hat to enhance their directory product offerings. A virtual directory is software that looks like a directory (typically, an LDAP one) but doesn’t actually store any data. Whenever a request comes in, the virtual directory retrieves the requested data from one of several configured data sources. A virtual directory can “front” an existing LDAP directory but it can also make data in relational databases, flat files or other sources appear to exist in an LDAP directory.

I think this is a pretty cool concept. Implementing a single, comprehensive, directory is at best difficult and, at worst, impossible. Companies frequently have data in multiple repositories. A virtual directory allows this data to appear to be in a comprehensive directory while actually remaining in their native stores. A virtual directory can also simply synchronization of data across repositories. Adding an object to a virtual directory can implicitly require the addition of data to the constituent repositories. Modifications to a datum might actually result in modifications to multiple repositories that contain the duplicated datum.

The biggest challenge with virtual directories is trying to retrofit them to applications that are currently directly wired to the constituent data sources. In the case of an application that reads database information by using JDBC/ODBC, it might be impossible to change it to using LDAP for its data access. Note that some virtual directories, however, can provide multiple access interfaces, for example, both LDAP and SQL. Even in the case of applications that currently use LDAP, it can be a challenge for a virtual directory to completely mimic a constituent LDAP repository. If the application analyzes the directory schema for example (something that’s possible with Microsoft Active Directory), the virtual directory either has to synthesize a comprehensive schema including data from other sources or it has to “lie” and only deliver the schema elements in the constituent repository. The former approach can confuse applications that expect a specific schema (for example, “Microsoft schema revision 31”) while the latter approach can confuse applications that use schema information to drive their operation. 

LDAP security can also be difficult to emulate/synthesize. If an Active Directory-aware program controls security by manipulating AD access control lists (ACLs), the virtual directory might need to synthesize objectSecurity attributes for objects that lie in repositories that don’t normally support ACLs and then reflect any changes back into the constituent stores. This might be difficult. Placing record-level ACLs on database rows, for example, might not be something that is supported by a constituent data store. In this case, the virtual directory might need to store its own parallel information.

Virtual directories can also be slow. The whole point of LDAP is to be fast (unlike the original X.500 directories that no one actually uses). If, however, data is actually coming from a slower store, fulfilling an LDAP request will be slow, too. For this reason, virtual directories need to perform intelligent caching.

Once virtual directories start storing their caches, they become a hybrid of sorts. They’re virtual directories but they can also behave like meta directories, too. I’ll write about those in some other post.

Why (OS) Architecture Matters

Friday, June 20th, 2008

In previous posts (for example, What Linux Needs to Learn From Microsoft) I’ve complained about the lack of management APIs in Linux and UNIX. Microsoft Windows has a comprehensive set of management APIs for a variety of OS features:

  • Networking
  • File/printer sharing
  • Event logs
  • Registry I/O
  • Service control
  • etc.

Moreover, each of these APIs typically works locally (on the machine the call is made) or remotely (on another machine). How is it that such a comprehensive set of APIs came to be and that these API should so consistently support remote management? The answer to both of these questions is by design.

Let’s understand what this means by going back in time to the late 80’s and early 90’s. I had just arrived at Microsoft back then (from HP). I started working on the Windows 1.04 SDK and then later moved to the OS/2 project.

In the MS DOS (pre-Windows) days, the concept of an API, let alone a network-aware one, was very crude. MS DOS programs accessed the operating system by performing “software interrupts”. An application would set up the x86 registers and would then perform an “Int 21”. The OS would field the interrupt and interpret the registers to determine what to do.

Programming language libraries, for example, the “C run-time library”, added the first primitive API to MS DOS. Instead of performing an Int 21, you could call “open()” or “fopen()” to open a file.

With Windows 1.0, the number of APIs greatly increased. if I recall, Windows added about 300 API including its “kernel”, windowing and graphics features. This number would eventually grow to 3000 in the Windows NT days.

When OS/2 arrived, something else had arrived with it: the local-area network. Networks had been available for some time (Novell Netware, Windows-for-Workgroups, MS DOS LANMAN). OS/2, however, introduced the concept of an operating system that was fundamentally network aware and could provide network services. Unlike Netware, OS/2 was a general-purpose operating system that could behave as a centralized server, too. OS/2 enabled the development of client-server applications.

The presence of networks and of network services necessitated the development of remote management APIs. For example, when a user typed:

net view \\fileserve1

The net utility had to be able to “talk” to fileserve1 and “ask it” what resources it was sharing. At the API level, the operating system needed to provide a NetShareEnum function that included a servername parameter indicating what server was being queried.

How did OS/2 “talk” to servers? Did it have some special protocol talking on some dedicated port just for this purpose? No. OS/2 was built on top of a basic RPC (remote procedure call) protocol that worked “on top” of named-pipes.

Now, I can’t provide all the details and history about how RPC progressed from OS/2 to today. My colleague at Likewise Software, Krishna Ganugapati, however, can. I have a link to his blog on my blogroll.

In Windows NT (later, Windows 2000, Windows XP and now Windows Server and Vista), the RPC mechanism became very formalized. Just about every OS API was written atop RPC. You would define your API using the IDL (interface definition language) and compile it (using the MIDL compiler). The compiler would generate a series of client and server-side stubs that would completely hide the details of RPC. You didn’t have to worry about marshaling arguments or about handling the transport layer. The RPC run-time would take care of communications over TCP/IP or over named pipes (maybe even netbios, too).

Local calls would skip the networking layers but the RPC mechanism would further server to marshal between user and kernel level code.

The key lesson here is that Windows NT was built from the ground up with an RPC, distributed system architecture. UNIX is not (although Sun was an early pioneer in RPC technology and still a key contributor). Linux is not. Neither is Mac OS X/Free BSD.

Admittedly, the RPC libraries might be available for these platforms (altbeit in a neglected state; we’ve made several fixes/improvements to them). The operating systems themselves are not designed from a distributed perspective.

Thus, it’s not suprising that management APIs, especially remote management APIs do not exist for these platforms. The respective vendors are all working on WS-Management or WBEM or some other protocols to help with the situation but they are still, fundamentally, in a weaker position than the Windows NT progeny.

At Likewise, we develop interoperability software that allows UNIX, Linux and Mac OS X computers to work well on Windows networks. In order to provide this software, we have to work with services on Windows that expect to be invoked over RPC.  Although, at first, we “hand-rolled”, these RPCs our most recent software is based on a full DCE RPC implementation. This has greatly facilitated our work and is allowing us to provider further interoperability features between Windows and non-Windows systems. Sometime soon, we will be releasing this software with both our open source and proprietary software products.

The Future of Linux

Thursday, June 19th, 2008

As a software developer, I get to see aspects of Windows, Linux, UNIX and Mac OS X of which end-users are oblivious. In previous posts, (for example, What Linux Needs to Learn from Windows) I have bemoaned the lack of standards between Linux distributions and the lack of system APIs. I’ve also praised Microsoft for delivering useful functionality in .NET (Programming is Fun [Again]).

Is Linux doomed to fail? Will Microsoft continue its hegemony?

If you read my series The Decline and Fall of Microsoft you know that I think Microsoft is facing some huge structural challenges. Nevertheless, here’s what I think is going to happen over the next 5 years:

  1. Microsoft will lose a large percentage of the general-purpose PC business to Apple. It will keep 50% or so, but mostly for dedicated workstations. Apple will take a majority of the laptop business.
  2. IT departments will continue to replace proprietary UNIX servers with Linux, especially given the move towards more virtualization.
  3. Windows will see an increased share of the server business driven by .NET and Sharepoint applications.
  4. Linux on the desktop will see minimal gains.
  5. Linux will dominate in the special-purpose, sub-notebook, business (such as with the Eee PC on which I am currently typing). Linux will also see increased use in some specific scenarios that require limited application availability.

All in all, I think there will be growth for Linux but it will actually lose overall composite share. I believe this for several reasons.

Although Linux is growing in the server business, I think that Microsoft will soon surpass its growth rate, if it hasn’t already (I don’t have my IDC numbers handy). Linux server growth is driven by UNIX-to-Linux conversions (i.e. new Oracle servers running Linux instead of Solaris) and by Apache server use. I think the former is a limited business and I think the latter is already limited by overall market growth. To grow share, Linux has to take away Windows server business, especially in the Intranet, and I don’t see this happening. Why? I think that Microsoft is winning the war for the hearts and minds of developers. Microsoft is an API company. Red Hat (except for JBoss) and Novell are not. Innovations in Linux API are mostly made by other companies (MySQL, Eclipse, Sugar CRM, etc.) and are often OS-independent. Microsoft, having totally stumbled on Vista, seems to have not screwed up Windows Server 2008, Sharepoint and SQL Server.

In the general-purpose, desktop/laptop business, Linux is missing Microsoft Office. In spite of the things I dislike about Office 2007, it’s still the Swiss Army knife of productivity software. Open Office is a poor, poor, substitute at any price. Even Office for the Mac is a poor substitute but it’s good enough. Workers who have the freedom to buy what they want (e.g. executives, technoids) will buy Macs. Vista is an embarrasment. I wish Microsoft would just port WPF and the new shell to XP, throw out everything else in Vista and admit its mistakes.

The only segment that I see Linux gaining market share is the specialty-use market. Sub-notebooks, like my Eee PC want to keep costs very low. This means they need an inexpensive OS that doesn’t require a lot of resources to run. Vista won’t cut it. Windows CE is too limited. I’d love to run Mac OS X on my Eee PC but I can’t without riling Apple’s lawyers. Linux is the only choice. Similarly, we’ve run into one or two companies that want to install Linux on a computer to use it only as a Citrix terminal. What’s weird is that they’ll be using Linux to run Windows applications via remote terminal services.

On the whole, this is not a very exciting future for Linux. There is still money to be made in the Linux business and Red Hat, Novell and others may be satisfied with the niches that I’ve described above. Personally, I find my prognoses somewhat depressing. I like Mac OS X but Apple’s walled-garden mentality is very 1980s. Microsoft still has a strong platform for software developers but I think that over time, their size and inability to execute will infect all areas in the company.

I wish that Linux was a better operating system than it is. I’d love there to be a credible alternative to the mess of Microsoft and the tyranny of Apple.