Wednesday, July 30, 2008
Data continues to grow at a frightening rate. According to an IDC study there was about 281 Exabytes of data stored on disk in 2007 word wide. This data is growing at CAGR of about 70%. At this rate, in 3 years there will be about 1400 Exabytes of data sitting on disk.
Now, a lot of this data is sitting on people's desktops, laptops, ipods, phones, digital cameras, etc. right now. However, things like cloud storage will change all of that. Heck, we are seeing some of the change right now with things like social networking sites, photo sharing sites, etc. IDC says that for 85% of that data a corporate entity will be responsible for the protection and security of the data.
So, in the future, we are going to have to store a lot more data than we do today, a LOT more data. How are we going to do that? Just the physical aspect of getting exabytes of data on the floor is going to be a challenge. I don't even want to talk about protecting and managing that much data. But for now, I want to talk about the density of the hard disk drive since that's going to soon become the physical limit of what we can store on the floor of our data centers.
The bits are getting too small!
Enterprise disk drive capacity has obeyed Moore's Law and doubled every 18 months for quite a few years. However, this growth has appears to be slowing down over the last 5 years, and it is now taking approximately 29-30 months to double the capacity of an Enterprise disk Drive.
This shows that we are nearing the maximum areal density (max capacity) of current disk drive technology called the superparamagnetic limit. Areal density as it refers to disk drives is measured by the number of bits per inch (bpi) times the number of tracks per inch (tpi).
The areal density of disk storage devices has increased dramatically since IBM introduced the RAMAC in 1956. RAMAC had an areal density of two thousand bits per square inch, while current-day disks have reached 100 billion bits (100 gigabits per square inch). Perpendicular recording is expected to increase storage capacity even more over time, but we do appear to be approaching the limit.
As the magnetic bits get smaller, at some point they no longer hold their charge. Thermal fluctuations reduce the signal strength and render the bits unstable. However, this ultimate areal density keeps changing as researchers find new techniques for recording and sensing the bit. Years ago the limit was thought to be 20 gigabits per square inch. Today, the limit is several hundred gigabits per square inch, and more than a terabit is expected soon. But that's about all you can get out of the technology.
Denser is faster.
Increasing the density of hard disk drives has a side benefit. It makes the drives faster as well. This is really quite logical when you think about it. Since the closer things are together on the drive, the more data passes by a read/write head in the same period of time thus making the drive faster.
Shorter term solution.
So, if the disk drive is not going to be able to continue to provide us with the kinds of capacities we are going to need in the future, what will? Well, there are a number of things that are being looked at by a lot of folks who are a lot smarter than me! But in the short term, things like SSD look promising once we work out some of the kinks. Specifically, the write speed issue. Until we can get that up I'm not sure how much general acceptance SSD technology is going to get. Price, I am convinced, will take care of itself as the scales of economy kick in. Holographic storage, some people have been working on this for a very long time and it seems like such a promising technology, but it has yet to come to fruition. There is one company out there that's trying to ship a product, but they recently pushed off their release date until the end of this year. Still, if they can work out the kinks, it definitely has promise, especially for media applications. But what about beyond that? What technologies are the researchers looking at that sound really cool? I look at some of those next.
Sci-Fi data storage.
So, this is where it gets fun. Some of the technologies that researches are currently looking into really do sound like something out of a Sci-Fi movie. Here are some examples of the stuff I'm talking about:
Nanodots - A nanodot has north and south poles like a tiny bar magnet and switches back and forth (or between 0 and 1) in response to a strong magnetic field. Generally, the smaller the dot, the stronger the field required to induce the switch. Until now researchers have been unable to understand and control a wide variation in nanodot switching response. A NIST team significantly reduced the variation to less than 5 percent of the average switching field and also identified what is believed to be the key cause of variability. Nanodots, as small as 50 nanometers (nm) wide could be used to storage data.
Array's of magnetic snakes - According to a weekly digest from the American Physical Society (APS), physicists at Argonne National Laboratory (ANL) have found that under certain conditions, magnetic particles could form magnetic ‘snakes' able to control fluids. According to the researchers, this magnetic self-assembly phenomena may be used to make the next generation of magnetic recording media or transparent conductors based on self-assembled conducting networks of magnetic micro-particles.
Nanowires - Switchable fluorescent proteins, able to move reversibly between two optical states, have been known from some years. But now, German researchers have discovered the mechanism behind this optical switch in a protein found on the tentacles of a sea anemone. According to the researchers from the University of Pennsylvania, Drexel University and Harvard University, barium titanium oxide nanowires suspended in water could hold 12.8 million GB per square centimeter. If the memory density can be realized commercially, "a device the size of an iPod Nano could hold enough MP3 music to play for 300,000 years without repeating a song or enough DVD-quality video to play movies for 10,000 years without repetition," the University of Pennsylvania researchers said.
Is the disk drive dead?
So, does this mean that the disk drive is dead in the future? I don't think so. I believe that the disk drive we know and love will simply move from one tier of storage to another. We are already seeing some of this movement with the implementation is backup to disk. Technologies such as data deduplication will continue to accelerate this process, and the addition of new primary data storage technologies will simply end the process by pushing hard disk drives from on-line primary storage to what will be considered near-line storage in the future. Long live the disk drive!
Monday, June 2, 2008
That's a lovely marketing phrase, but when it comes to storage, it does, and it doesn't. What you really need to understand is how VMWare can effect your storage environment as well as the effects that storage has on your VMWare environment. Once you do, you'll realize that it's really just a slightly different take on what storage administrators have always battled. First some background.
Some Server Virtualization Facts
- The trend of server virtualization is well under way and it's moving rapidly from test/dev environments into production environments. Some people are implementing in a very aggressive way. For example, I know one company who's basic philosophy is "it goes in a VM unless it absolutely can be proven it won't work, and even then we will try it there first."
- While a lot of people think that server consolidation is the primary motivating factor in the WMVware trend, I have found that many companies are also driven by Disaster Recovery since replicating VMs is so much easier then building duplicate servers at a DR site.
- 85% of all virtual environments are connected to a SAN, that's down from nearly 100% a short time ago. Why? Because NFS is making a lot of headway, and that makes a lot of sense since it's easier to address some of the VMWare storage challenges with NFS than it is with traditional fiber channel LUNs.
- VMWare changes the way that servers talk to the storage. For example, they force the use of more advanced file systems like VMFS. VMFS is basically a clustered file system and that's needed in order to perform some of the more attractive/advanced things you want to do with VMWare like VMotion.
Storage Challenges in a VMWare Environment
- Application performance is dependant on storage performance. This isn't news for most storage administrators. However, what's different is that since VMWare can combine a number of different workloads all talking through the same HBA(s), the result is that the workload as seen by the storage array turns into a highly random, usually small block I/O workload. These kinds of workloads are typically very sensitive to latency much more than they require a great deal of bandwidth. Therefore the storage design in a VMWare environment needs to be able to provide for this type of workload across multiple servers. Again, something that storage administrators have done in the past for Exchange servers, for example, but on a much larger scale.
- End to end visibility from VM to physical disk is very difficult to obtain for storage admins with current SRM software tools. These tools were typically designed with the assumption that there was a one-to-one correspondence between a server and the application that ran on that server. Obviously this isn't the case with VMWare, so reporting for things like chargeback becomes a challenge. This also effects troubleshooting and change management as well since the clear lines of demarcation between server administration and storage administration are now blurred by things like VMFS, VMotion, etc.
- Storage utilization can be significantly decreased. This is due to a couple of factors, the first of which is that VMWare requires more storage overhead to hold all of the memory, etc. so that it can perform things like VMotion. The second reason that VMWare uses more storage is that VMWare admins tend to want very large LUNs assigned to them to hold their VMFS file systems and to have a pool of storage that they can use to rapidly deploy a new VM. This means that there is a large pool of unused storage sitting around on the VMWare servers waiting to be allocated to a new VM. Finally, there is a ton of redundancy in the VMs. Think about how many copies of Windows are sitting around in all those VMs. This isn't new, but VMware sure shows it to be an issue.
Some Solutions to these Challenges
As I see it there are three technical solutions to the challenges posed above.
- Advanced storage virtualization - Things like thin provisioning to help with the issue of empty storage pools on the VMWare servers. Block storage virtualization to provide the flexibility to move VMWare's underlying storage around to address issues of performance, storage array end of lease, etc. Data de-dupulication to reduce the redundancy inherent in the environment.
- Cross domain management tools - Tools that have the ability to view storage all the way from the VM to the physical disk and to correlate issues between the VM, server, network, SAN, and storage array are beginning to come onto the market and will be a necessary part of any successful large VMWare rollout.
- Virtual HBAs - These are beginning to make their way onto the market and will help existing tools to work in a VMWare environment.
Organizations need to come to the realization that with added complexity comes added management challenges and that cross domain teams that encompass VMWare Admins, Network Admins, and SAN/Storage Admins will be necessary in order for any large VMWare rollout to be successful. However, the promise of server virtualization to reduce hardware costs and make Disaster Recovery easier is just too attractive to ignore for many companies and the move to server virtualization over the last year shows that a lot of folks are being drawn in. Unfortunately, unless they understand some of the challenges I outlined above, they may be in for some tough times and learn these leassons the hard way.
Saturday, May 24, 2008
I'll have some more detailed postings on what I think about some of the technology I saw at EMC World a little later. Right now I just wanted to talk a a bit about the general trends and feelings I got from the convention.
First and foremost, EMC has finally awakened to the fact that people want de-duplicating products, and they want them now. EMC has really been behind the eight ball when it comes to dedupe. I don't know if it was because of their close relationship with FalconStor in the past, or what, but they really didn't have much of a story to tel l when it came to dedupe, and start-ups like Data Domain where definatly eating EMC's lunch in that market. But the EMC giant has definitely awakened from it's slumber, and introduced some interesting new products.
Basically, the new products fall into two categories, first to a software addition to the existing DL400 line which provides deduplication. The second is a new line of deduplication engines that provide much the same capabilities as Data Domain does. The main differences are that EMC's appliances provide the users with a choice between in-line, post processing, or no deduplication at all. They also have a well designed VTL feature which is an area that Data Domain has been struggling in.
The other area that EMC was emphasizing was "green computing". A lot of this was nothing more than marketing hype and spin on existing products. However, they did mention a feature that they would provide soon that really was "green computing". The idea was to spin down drives that weren't currently in use. Now when EMC didn't introduce and specific products yet, they did suggest that we would see this technology first in the VTLs, but that it could make an appearance in the overall CLARiiON line in the not too distant future.
Overall, a lot of EMC marketing around "green", but some new technology and a good opportunity to talk with the folks at EMC about where they are going with some of the products. I got to spend a little sime talk ing with the folks who work on StorageScope about reporting, and support for AIX VIO in Control Center in general.
Finally, I took my wife along so she could have some fun as well, and I think she ended up having more fun than I did. Las Vegas is a great place for shopping, hanging out in the SPA, and generally having a good time. All of which she did while she was there. We also went to see Phantom of the Opera, which was great. Overall, a good trip for both of us. More details on a latter posting.
Tuesday, May 13, 2008
After re-reading yesterday's posting, I had one of those "well DUH" moments. It seems obvious now, but it hit me like a ton of bricks. Block Storage Virtualization (BSV) is creating a sea change for how people are going to buy their storage.
Once we are in a virtual world, then we no longer need "intelligent storage". All we really want is cheap storage, the intelligence will be elsewhere (i.e. in the virtualization engine). Of course, this is the reason that so many vendors (NetApp and HDS spring directly to mind) have put virtualization right into their array. They are really just trying to hold back the tide.
But it really is holding back the tide. Why would I want to commit to a vendor like HDS as my front end virtualization engine? Why wouldn't I want a completely independent engine? Well, at least one reason springs to mind. It might be easier to get there from here. What I mean by that is if I have an existing vendor's product already in house, have processes built around it, have people trained, etc. then it makes some sense to be able to leverage all of that knowledge and all those processes. However, if I'm not a NetApp or HDS shop, then why would I bring them in just to virtualize my existing storage? It's no easier from a training/process perspective to do that than to go with something that's a pure virtualization play like SVC, Invista, or Yadda Yadda.
The difficulties involved in virtualizing your existing storage/application are something you should seriously consider. Picking a virtualization engine that will allow you to "encapsulate" your existing LUNs, for example, might make the process of rolling out the virtualization engine a lot less painful for your users than allocating all net new storage that's been "virtualized" and then copying your data to the new "virtualized" LUNs.
So what does all this lead to? I suspect that what we will see from the storage vendors are more "dumb" array products and increased sales of arrays like EMC's CLARiiON AX lines of storage. Why pay for all of that expensive smarts in something like a Symmetrix when all you really need is something that can serve up LUNs that perform well. So the sea change I predict is coming is not the complete demise of the storage "big iron", no, it's more like they will go the way of the mainframe. There will still be a business there, it just won't be as big a business as it once was. Sure, the vendors will fight against it, just like IBM did with the mainframe, but in the end I think that the results will be the same, storage "big iron" will get marginalized.
Monday, May 12, 2008
Yes, that's right, with the economy getting tight, I suspect that IT budgets, even those for storage, are going to get slashed. So, how are storage managers going to do more with less? You don't think that with the budget cuts there will also be a reduction in the growth of storage/data do you? Of cource not! The business will simply expect the storage team to do more with less, that's all. Simple really, don't you think?
What this will mean is that storage managers are going to be looking for a way to drive the per-GB cost of storage down even more. For many I think that the answer will be block storage virtualization.
Why? Well, I think that there are a couple of answers to that. First off, one direct way to reduce CAPEX will be to drive down the cost of the array's themselves. How? Easy, more competition. If I virtulize the storage, then the array becomes even more of a comodity than it is today, thus driving down the price. It's basic economics really. The more vendors I allow to bid on my next 100TB storage purchase, the lower the price per GB should be, right?
Also, if the real "smarts" is in the virtualization controller, then I don't need it in the disk array, so I can save money on licencing the software in the array. I no longer need to buy replication software from each storage vendor, I have a single replication mechanism which is probably in the virtualization controller itself. More in this in a later post, I think it's going to have a huge impact on the storage vendors going forward.
I also think that I can achieve some OPEX savings by having more efficent operations and fewer outages. Think about it, if all of my storage admins work with a single tool for provisioning, replication, etc. then I have more people with the same skill set, all working in the same interface. That's got to be more efficient and less error prone than having a couple of folks who know the HDS stuff well, and a couple more that know the EMC stuff well, etc.
You had this option before by just buying all of your storage from a single vendor, the trouble with that way of approaching things was that I also had vendor "lock-in". The vendor knew that they had me by the short hairs. Where this really showed up was not in the per/GB price of my storage, or my storage software. I mean, anyone with two brain-cells to rub together knows that if you are going to get everything from a single vendor you better lock in your discount up front, and it better be big. But trust me, the vendors made up for those big discounts via things you didn't have them locked in on. Professional Services, for example. At any rate, virtualization gets me out from under all of that, makes provisioning something that anyone on the team can do at any time following the exact same processes and procedures. You have to believe that will have a posative effect on you OPEX costs.
So, if 2008 is the year of block storage virtualization, what about file virtualization? We all still do NAS right? More on that next time.
I've been thinking about doing something like this for some time, but never got around to it with all of the pressing things going on at work and home, etc. But I just really need to get some things off my chest when it comes to this topic, so here I am!
What I plan to write about here is simple, it's what I know best, computers and storage. I work in the storage business, but I grew up in the Systems Administration side of things. I think that this gives me a bit of a bias, although I prefer to call it a perspective.
I've actually worked on both sides of the equation so to speak. I've worked for manufactures such as CDC (yes, you need to be old like me in order to recognise that they actually built computers back in the day) and EMC. I've also worked implimenting systems and storage at companies in the helathcare industry as well as media companies.
This has been a long time coming, and I have a lot of pent up demand on topics, so I suspect that there will probably be a lot of entries really quick.
Climb in, strap in, and hang on, cause here we go!