Open main menu

Linux and Unix Users Group at Virginia Teck Wiki β

Changes

Infrastructure:Incident 2015-04-23

54 bytes added, 03:04, 24 April 2015
no edit summary
In the early morning of April 23, 2015, [[gp:Whittemore Hall|Whittemore Hall]] lost power and brought down VTLUUG infrastructure. Issues bringing hardware back up were compounded by:
* Maintenance deferred for far too long, which prevented some machines from booting on their own.
* Failure to notice the clocks on VMs were wrong due to a dead CMOS battery on wood, which prevented Kerberos from working properly.
Some steps that should be taken to reduce the length and impact of future outages include:
* Rebuild VMs like [[Infrastructure:milton|milton]], which have had too much maintenance deferred, and move them to cyberdelia.
* Install NTP on all servers.
* Find a dedicated sysadmin.
* Create a disaster recovery plan.
Anonymous user