Virtual Directories

I’ve talked to a couple of companies now that sell virtual directory products.  Most recently, I talked to Identyx, a company recently bought by Red Hat to enhance their directory product offerings. A virtual directory is software that looks like a directory (typically, an LDAP one) but doesn’t actually store any data. Whenever a request comes in, the virtual directory retrieves the requested data from one of several configured data sources. A virtual directory can “front” an existing LDAP directory but it can also make data in relational databases, flat files or other sources appear to exist in an LDAP directory.

I think this is a pretty cool concept. Implementing a single, comprehensive, directory is at best difficult and, at worst, impossible. Companies frequently have data in multiple repositories. A virtual directory allows this data to appear to be in a comprehensive directory while actually remaining in their native stores. A virtual directory can also simply synchronization of data across repositories. Adding an object to a virtual directory can implicitly require the addition of data to the constituent repositories. Modifications to a datum might actually result in modifications to multiple repositories that contain the duplicated datum.

The biggest challenge with virtual directories is trying to retrofit them to applications that are currently directly wired to the constituent data sources. In the case of an application that reads database information by using JDBC/ODBC, it might be impossible to change it to using LDAP for its data access. Note that some virtual directories, however, can provide multiple access interfaces, for example, both LDAP and SQL. Even in the case of applications that currently use LDAP, it can be a challenge for a virtual directory to completely mimic a constituent LDAP repository. If the application analyzes the directory schema for example (something that’s possible with Microsoft Active Directory), the virtual directory either has to synthesize a comprehensive schema including data from other sources or it has to “lie” and only deliver the schema elements in the constituent repository. The former approach can confuse applications that expect a specific schema (for example, “Microsoft schema revision 31”) while the latter approach can confuse applications that use schema information to drive their operation. 

LDAP security can also be difficult to emulate/synthesize. If an Active Directory-aware program controls security by manipulating AD access control lists (ACLs), the virtual directory might need to synthesize objectSecurity attributes for objects that lie in repositories that don’t normally support ACLs and then reflect any changes back into the constituent stores. This might be difficult. Placing record-level ACLs on database rows, for example, might not be something that is supported by a constituent data store. In this case, the virtual directory might need to store its own parallel information.

Virtual directories can also be slow. The whole point of LDAP is to be fast (unlike the original X.500 directories that no one actually uses). If, however, data is actually coming from a slower store, fulfilling an LDAP request will be slow, too. For this reason, virtual directories need to perform intelligent caching.

Once virtual directories start storing their caches, they become a hybrid of sorts. They’re virtual directories but they can also behave like meta directories, too. I’ll write about those in some other post.