On Jan 3, 2011, at 8:12 PM, Ross Walker rswwalker@gmail.com wrote:
On Jan 3, 2011, at 8:10 PM, Ross Walker rswwalker@gmail.com wrote:
On Jan 3, 2011, at 2:39 PM, Dave tdbtdb+centos@gmail.com wrote:
On Sat, Jan 1, 2011 at 10:06 PM, Gordon Messmer yinyang@eburg.com wrote:
On 01/01/2011 05:56 PM, Dave wrote:
Is there a best practice? People have to be doing something!
I think that's unlikely. If you don't "oversubscribe" your disk space as a matter of policy, you'll force upgrades earlier than most people would consider them necessary. Most users, I'd expect, will be well under quota most of the time. You'd commit all of your disk space to quota long before the space was actually used. In your scenario, you'd be required to expand the disk array whenever it was committed to quota, even if actual use was very low. Every site that I know of which uses quotas handles disk upgrades when utilization requires it, not when quota subscription does.
So, is it fair to rephrase that as "ignore quotas, pay attention to actual usage"?
I agree that some degree of oversubscription is probably desireable, and it would be much easier to just add storage whenever it looks to be getting fullish. My situation right now makes that difficult - budget is gone, so I can't add storage, and my users sometimes start up a big simulation that could potentially fill the disk right before the weekend. If the hoggy simulation crashes itself, that's okay, but if it brings down a lot of other jobs submitted by other users, I look bad. I guess even if there was some good tool support, this task is doomed to make everyone unhappy.
Maybe you can have the users run these in containers like OpenVz that are set to clean themselves up after they finish?
Or use Amazon's elastic computing cloud to provision simulation VMs, run the simulations, report results, then completely disappear.
I was just thinking if you had a couple of base simulation disk images created for containers on LVM you could create clones (writable snapshots) for each container at the time they start, have the simulation run and when the container stops the snapshot can be destroyed, cleaning up the used disk space.
All this can be scriptable and can kick off via cron or 'at', or whatever.
-Ross