Archive for the ‘Technology’ Category

More on NAS at Home

Tuesday, July 8th, 2008

After re-reading my last post, I realize that some of you might have no clue what I’m talking about when I mention network attached storage (NAS). To use an oxymoron, this post is a follow-up primer.

The idea with a NAS is to centralize storage across multiple machines in a network. Instead of having to maintain numerous independent disk drives on the individual machines in a network, NAS places all key files in a central location and worries only about managing the NAS.  This concept is frequently used with server computers but can also be used with workstations. Microsoft Active Directory, for example, supports the concept of a roaming profile that allows your personal files to be stored in one consistent place regardless of what computer you login to. UNIX and kin can do something similar with automounts.

There are actually two main mechanisms for implementing centralized storage.

The storage area network (SAN) approach is a different approach than that used by NAS. A SAN storage appliance provides low-level storage “blocks” to the computers connected to it. The SAN device has no concept of a “file” only of an assortment of storage blocks assigned to a particular computer. SANs are frequently accessed by a separate, high-speed, fibre channel network but can also be accessed over Ethernet using iSCSI and other other protocols.

A NAS device, on the other hand, provides file-level operations. The device implements the smb/cifs protocol and/or the NFS protocol in order to provide file-oriented services to Windows or UNIXy computers (respectively).

If you have used a traditional Netware or Windows-based file server you have used a NAS device. There are much cooler devices now, however. Isilon, for example, makes very clever clustered storage NAS devices that allow multiple NAS nodes to replicate data in a fashion that provides redundancy and high-availability at much lower cost than SANs and many other NAS devices.

The Linksys NAS 200 device that I talked about in the last post is a dirt-cheap home NAS device. It is not particularly fast nor does it offer much sophisticated functionality. Its security model, for example, is very crude. I run a Windows domain controller at home but the NAS 200 does not integrate with AD-based security. To avoid authentication hassles, I simply allow the guest (any user) to have read/write access to all the shared folders. Fine for home (where things are protected with a perimeter firewall and with secure wireless access points) but not fine for a more public network.

I installed the Linksys appliance in order to provide a backup destination for the 6 computers that we have strewn throughout the house. Using the appliance means that I don’t have to dedicate a general-purpose computer to this task. Additionally, Linksys has figured out how to set up Raid and how to automatically perform various recovery operations all using a simple Web interface. It would have been much more complicated for me to figure this out myself.

The one last piece of the backup puzzle that I’d like to implement would be to add some form of offsite storage. Ideally, the NAS 200 would, itself, backup files to some Web-based storage provider. Since it doesn’t, I might have to implement this myself with some type of periodic job that detects new files on the NAS and copies them to a service during off hours.

 

NAS and Virtualization at Home

Monday, July 7th, 2008

In between eating hot dogs and blowing up fireworks this weekend, I worked on a couple of home IT projects that I’d been planning for a while. My goals were straightforward. First, I wanted to implement a more robust backup solution. Second, I wanted to get rid of a Fedora Core 5 server and replace it with something newer. The two projects were related since the FC5 server was being used solely as a Samba file server to host my backup drives. Here’s what I ended up doing to accomplish both tasks.

I didn’t like the dedicated Fedora server for two reasons. First, I was stuck using a computer for a very narrow purpose. I have only two computers in my “server room” (my den) and my other one is my AD domain controller. I’ve been installing lots of Windows application software on the AD machine because I can’t run it on Linux. Installing random software on a DC is not a good idea. The second reason I wanted to get rid of the Fedora machine is because I wanted to run a more current distro. I worried about replacing FC5 with Ubuntu, however, because FC5 uses a funky logical volume manager. If the change failed, I might have to scramble to recover my data.

To get rid of the FC5 file server, I spent $150 on a Linksys NAS 200 device and $200 on two 500Gb SATA hard disks. The Linksys device is, essentially, a cheapy Linux box with an ethernet port and two SATA drive bays. It can be configured to use the drives separately or in a RAID 0 (striping) or RAID 1 (mirroring) array. I chose the latter configuration giving me 500Gb of storage but with the security of knowing that I can lose a drive and still have my data.

The Linksys NAS 200 was pretty easy to install. I made one mistake which was to start using it (copying over 100Gb to it) before realizing that it was running very slowly. A look at the Linksys web site showed that there was a firmware update that allowed the use of a non-journaled file system. Without journaling, the Linksys device is much faster but will have to perform a “scandisk” (fsck) if detects any disk errors. Installing the firmware upgrade and switching to the non-journaled file system required reformatting the disks and re-copying the 100Gb again.

WIth the NAS in place, I was able to go to my FC5 computer and copy over all the old backups. I then changed the key computers in the house to use the NAS instead of the FC5 machine for backups. Along the way, I also stopped using Windows backup software and started using NTI Shadow (dumb, cheap) instead.

Now that my FC5 computer was out of a job, I could repurpose it. I increased its RAM to 2Gb, deleted its Linux file system partitions and installed Windows XP on it. Deleting the partitions was necessary as, with them,  Windows XP would get confused during installation.

The first thing I did after installing XP (well, the second, after waiting for SP2 and a million other updates to install), was to install VMWare Workstation. VMWare Server is free, but the Workstation version allows for multiple “snapshots” which I find very useful.

With VMWare installed, the first VM I created was an Ubuntu 8.04 Linux VM.

What’s the point of replacing a Linux machine with a Windows machine running Linux in a VM? Two things: first, I can run Windows software in the host operating system. Actually, I will probably create a Windows XP VM and run the Windows software in the VM instead of on the host OS. Second, if I get tired of one Linux distribution, I can always create another VM with a different one.

With VMWare, I can keep my host OS in pristine condition. I won’t install any application software there. If any problem occurs in a guest VM, I can always use the VMWare snapshot features to “undo” them. Worst case, I can blow away a VM and recreate it. What about data? Here’s the key: don’t keep your important data on virtual disks. Use virtual machines, but keep your data on real drives or on a NAS device. Windows file shares or the Linux mount.cifs command can help with this (if you keep all your data on WIndows file servers; if you want, you can use NFS and store data on UNIX file servers, instead). Use virtual disks only to store operating system files.

This is exactly the architecture used in large virtualized Enterprise IT departments. Application data is kept on attached storage accessed by one or more virtual machines. Deploying additional virtual server instances is easy because the data is centrally located. The same concept can be used at home, on a smaller scale.

Everything is up and running now. I’m happy running Ubuntu instead of FC5 and I’m happy knowing my data backups are mirrored. I’ll be tracking the performance of the NAS device over the next few weeks and months. Consumer NAS devices are a tricky tradeoff of simplicity vs. functionality and performance. Someday, I want to experiment with removing a drive from the array and validating that the RAID rebuild occurs properly. For now, I’m just hoping the NAS is doing the right thing.

Software and Socialism

Friday, July 4th, 2008

This post might more accurately refer to Totalitarianism instead of Socialism but what it lacks in precision, it makes up for with alliteration!

July 4th! Independence Day. In Seattle, we also refer to it as “the day before summer begins.” It always rains here on July 4th. It’s a tradition.

Beyond its meteorological implications, July 4th commemorates the signing of the Declaration of Independence. 232 years ago, representatives from the original 13 colonies formalized their desire to secede from the British Empire. What beef did they have with King George? Why all the fuss?

True, we know they were upset about “taxation without representation” and about the Stamp Act and import tariffs. Did you know, however, that Britain had ceded on all these points? Did you know that the 13 colonies had the highest standard of living in the world at the time and a very low tax rate?

Beyond any concrete economic or political issues, the colonies wanted to secede from Britain because they resented being told what to do by a remote sovereign that treated them as second-class citizens. If you read the Declaration of Independence and, even more so, the Constitution, you will easily detect a fundamental distrust of centralized Federal government. The forefathers went out of their way to delegate the least number of powers to the Federal government – the rest were reserved for the states. Arguably, the 2nd Amendment is all about the rights of the States to maintain militias so that they could fight the Federal government. Remember, the militias were formed to fight the British. The last thing the colonies wanted was to be powerless against another powerful central government that might turn out to be just as objectionable as the first.

This fundamental question of State vs. Federal rights is one that remains with us today. Battles over abortion, Medicaid payments, “No Child Left Behind” and other issues focus on what are state responsibilities and what are Federal responsibilities. The States rights proponents argue that local government is more efficient and more responsive to local needs than a centralized Federal government.

Centralized vs. distributed control is also a frequent topic of discussion in Socialist vs. Capitalist systems. In Socialist systems, the State owns all the capital and decides how to allocate it. In Capitalist systems, individuals own capital and they decide how to invest it. Again, the question is one of tight, centralized, control vs. a loose distributed system.

This same tension exists in large software architecture. To what degree should software be tightly controlled by some central program vs. loosely controlled and autonomous? Occasionally, I will review a design for a complex system and I will compain to the architect that the design is “too Communist.” What I mean by this is that it too heavily relies on central planning and control.

As with governments and economies, there is a strong argument that software architecture should avoid excessive dependence on centralized control. Centralized control can lead to very brittle software that breaks when an alternative design would simply bend.  Centralization frequently translates to designs with single points of failure. A server goes offline or a process fails and the whole system collapses.

As one who dislikes both centralized designs and socialist governments, I like to see software architectures with:

  • Redundant, clustered, storage (e.g. from Isilon)
  • Compute and database clusters based on active/active designs to minimize failover time
  • Pull-based models where automonous components ask for what they need instead of push-based models where an executive tells the others what to do
  • Self-organizing systems rather than ones driven by rigid configuration

It is my premise that such systems are no harder to develop than centralized ones but they do require more creative architects to design them. Because we tend to design systems in a hierarchical fashion, it’s easiest to develop control systems that also flow from top-to-bottom. Developing loosely coupled systems takes a different mind-set. Once you’ve identified the components in your design, you need to treat each as an independent entity and you need to think about how they get the information they need and what they do with it when they’ve finished processing it. You also need to consider what each entity should do when it detects and error condition. If you design your components to be independent and robust, your design might be better able to heal itself if a component has an intermittent failure.

It is true that, in the physical world, there are economies of scale but it is also true that past a certain point, inefficiency and bureaucracy increase with organizational size. In the software world, there are few benefits to centralization and size and many clear deficiencies. As the computer and IT industry moves increasingly towards outsourcing and SaaS, we need to keep in mind to what degree delegating responsibilities to service providers is a move towards centralization and all of its flaws.