In November, Google wrote in their official blog that they had done an experiment where they had sorted 1 PB (1,000 TB) of data with MapReduce. The information about the sorting itself was impressive, but one thing that stuck in our minds was the following (emphasis added by us):
An interesting question came up while running experiments at such a scale: Where do you put 1PB of sorted data? We were writing it to 48,000 hard drives (we did not use the full capacity of these disks, though), and every time we ran our sort, at least one of our disks managed to break (this is not surprising at all given the duration of the test, the number of disks involved, and the expected lifetime of hard disks).
Each of these sorting runs that Google did lasted six hours. So that would mean that hard drives would be breaking at least 4 times a day for every 48,000 hard drives that a data center is using.
Interesting, isn’t it? We have discussed this several times around the office here at Pingdom. Data centers are getting huge. How many hard drives are there in one of these new, extremely large data centers? 100,000? 200,000? More?
Add to this the “cloud computing” trend. Since we store more and more data online, data centers will have to keep adding more data storage capacity all the time to be able accommodate their customers.
To name an example of how enormous some of these new data centers are, Microsoft has stated that it will have 300,000 servers in a new data center they are building in Chicago. We don’t know how many hard drives that will result in for storage, but we imagine that it will be many.
So, let’s assume we have one huge data center with 200,000 hard drives. At least 16 hard drives would break every day. With 400,000 hard drives, one hard drive would break every 45 minutes. (Ok, perhaps we’re getting carried away here, but you get the idea.)
Does this mean that these huge data centers will basically have a dedicated “hard drive fixer” running around replacing broken hard drives?
Is the “cloud computing era” ushering in a new data center profession? Hard drive boys? 🙂
Maybe this is already happening?
Questions for those in the know…
So, if this is already the situation, or will be in the near future, at least in “mega data centers”, what would be the best way to handle this? Would you organize your data center with this in mind, keeping all storage in close proximity to avoid having to walk all over the place? And what about the containerized data centers that for example Microsoft is building? Would you have to visit each separate container to deal with the problems as they arise?