Infrastructure:Sysadmin Handbook

From the Linux and Unix Users Group at Virginia Teck Wiki
Jump to: navigation, search

This page describes how to manage the infra. See rtfm.txt for a guide to build it from scratch.

This covers setup of a VM on meltdown or spectre depending on if the service is critical or not.

Infodump (i will clean this up later, promise)

LUUG infrastructure runs on, essentially, four key components:

  • Hosts
  • NFS
  • Auth

and

  • out-of-band Ansible & Docker manifests

Almost all of our services are hosted in Docker containers across various hosts: on Infrastructure:Gibson, the LLM server, on Infrastructure:Sczi the web content. 100% of these docker containers have their configuration detailed here.

The entire repository is cloned to /nfs/cistern/docker/apps, and the docker-compose.yml files for each service are ran with the command 'docker compose up -d' while in the service folder.

Note the path: /nfs/cistern/docker/apps. Looking at the docker-compose folders & configs, you will notice that the *data* for the container is **never** stored alongside the compose files themselves. Instead, they are stored at /nfs/cistern/docker/data/<insert-service-name>/<etc>. This is an NFS (Network File System) mounted path: it exists physically on our NFS server, Infrastructure:Dirtycow and is mounted over the local network.

The implications of this should be clear: *the host install does not actually matter*. If the operating system for e.g. Infrastructure:Sczi blew up, all one would need to do to bring everything back up is re-create it, install docker, mount the cistern NFS directory (with the date files still intact), set up auth, and start all the containers again. No data is ever lost, because nothing is stored on the host itself: it's all on the NFS share.

How do you easily set all that stuff back up again? Ansible. you can think of ansible as a language designed for defining deployed servers. It uses YAML (.yml), and "roles" are specified for each server. in roles/<server role>/tasks, there exists a list of things needed to set up the server, and in /hosts.cfg there exists a list of servers and which roles they all have. All you need to do to set a server up is run ansible -- it will take care of the rest. You can run it twice, or a million times, to no ill effect: it's designed to be idempotent.

Knowing this much, you can re-create Infrastructure:Sczi and Infrastructure:Gibson, but there are a few remaining things: VM hosts (Infrastructure:Meltdown, Infrastructure:Spectre), and the router (Infrastructure:Shellshock). Deploying the router is described in rtfm.txt, but VM deployment is entirely automated via ansible, which is *sick as fuck*. It only works for ubuntu server and redhat enterprise (alma, rocky, centos) distros, but for those it works brilliantly -- add a VM to this file and run the ansible playbook -- the new VM will automagically create.

Web traffic! We run DNS through Gandi. Ask an officer to add you to the VTLUUG org on that website (User:Rsk has access, if you're reading this in the far future and need it). Each host gets a direct A record pointing at it's IP address, and web content *all* points to Infrastructure:Sczi via CNAME records. Sczi's docker config has an nginx container that handles certificates and reverse proxying.

Acidburn is our singular "traditionally managed" server. It runs many services, mail among them, and all are running as services on the VM itself, not a container in sight (sans the IRC <-> Matrix bridge, which is there for IP whitelisting reasons. You can redeploy it from ansible, but it won't have the same soul. Try not to break it.

Auth We run two Authentication servers, Infrastructure:Chimera and Infrastructure:Sphinx. They're both on the same FreeIPA network and can be deployed via ansible.

FreeIPA is a full-stack authentication provider. Part of our ansible playbook for LUUG hosts runs ipa-client-install, which sets up the hosts as "clients" to this FreeIPA network, and allows users with FreeIPA accounts to log in via ssh, reflecting usergroups over on to the system.

Infrastructure:Spectre notably is *not* a FreeIPA client, because it's intended for use by non-LUUG entities (whether that be personal member VMs or ones loaned out to other student orgs).

The root account password is in the vtluug-admin private repository. Ask someone to be added to the officers group.


Networks

Further information: Network

We should have the following networks in place:

  • meltdown and spectre br0 on eno1 <--> enp4s0 on joey. This is the main LUUG network.
    • 10.98.0.0/16 for VTLUUG NAT
    • IPv6 via prefix delegation on 607:b400:6:cc80/64
    • Global IPv4s via ARP proxying (See https://github.com/vtluug/scripts). Gateway is 128.173.88.1/22.
  • Static hosts are on 10.98.0.0/24, and DHCP is enabled on 10.98.1.0/24. This is mainly just useful for organization and quickly finding new hosts or other hosts on the network.
    • Static host IPs are assigned via static DHCP leases for IPv4.
    • Since we can't do this with IPv6, physical host IPs are determined upon first boot and VMs are assigned a specific MAC to pre-determine the SLAAC IP.
  • "Internet" (a CNS portal) <--> enp2s0 on joey. LUUG only has one of these, and port security is probably enabled.

DNS/DHCP:

  • All DNS entries for services run by VTLUUG are hosted on Gandi. Ask an officer if you want to change something.
  • jkh and Roddy own ece.vt.edu. DNS updates don't happen. echarlie can add IPv6-only records if needed to wuvt.vt.edu so we have PTRs.
  • joey runs DHCP via dnsmasq on enp4s0 (that is, 10.98.0.0/16). To change anything, modify it on https://github.com/vtluug/scripts first then pull that into root's homedir on joey. Please don't just update it on a machine without pushing your updates.
  • By default, hosts are accessible via SSH on ports 22 and 2222.

Adding a VTLUUG Service VM

VMs in this category are deployed to meltdown

Prerequisites:

Configure the network

  • Decide on a MAC address for the host and add it to SCRIPTS/router/lan/local_hosts
  • Add an entry to SCRIPTS/router/lan/dnsmasq.conf for static DHCP leases.
  • If a new IP in 128.173.88.1/22 is being added, also add it to SCRIPTS/router/proxy/arp_proxy.sh

Note: It is not recommended that you do the following steps if nobody is on campus in case something breaks.

Pull the latest changes to /root/scripts, update the configuration files, and restart the services:

  • Dnsmasq configuration is at /etc/dnsmasq.conf
  • ARP Proxy configuration is in /usr/local/bin

Add the VM configuration to ansible

Edit ANSIBLE_PATH/roles/deploy-vms/defaults/main.yml and add a new entry, following the existing format.

Note: if there are any entries in this file that are not present on the VM host, they will also be created. Comment out entries that shouldn't be created. Existing hosts are skipped.

Run ansible-playbook -v deploy.yml -i hosts.cfg -u papatux -k -K -e @VTLUUG_ADMIN_REPO/accounts.yml, using the correct vtluug-admin repo path.

Testing

The new host should be accessible by papatux on via SSH port 2222 (and 22) over IPv6 and IPv4 from the internal network. Check 10.98.1.0/24 to see if it had any issues getting a static DHCP lease and if the MAC is correct.

Adding a User VM

VMs in this category are deployed to spectre

Prerequisites:

Configure the network

  • Decide on a MAC address for the host and add it to SCRIPTS/router/lan/local_hosts
  • Add an entry to SCRIPTS/router/lan/dnsmasq.conf for static DHCP leases. (If applicable; you might not care for a test/temp VM).

Note: It is not recommended that you do the following steps if nobody is on campus in case something breaks.

Pull the latest changes to /root/scripts, update the configuration files, and restart the services:

  • Dnsmasq configuration is at /etc/dnsmasq.conf

Add the VM configuration to ansible

Edit ANSIBLE_PATH/roles/deploy-vms/defaults/main.yml and add a new entry under new_vms_spectre, following the existing format.

Note: if there are any entries in this file that are not present on the VM host, they will also be created. Comment out entries that shouldn't be created. Existing hosts are skipped.

Run ansible-playbook -v deploy.yml -i hosts.cfg -u papatux -k -K, using the correct vtluug-admin repo path.

Important: A random root password is set during VM creation and printed to stdout. Record this!

Testing

The new host should be accessible by root on via SSH port 2222 (and 22) over IPv6 and IPv4 from the internal network. Check 10.98.1.0/24 to see if it had any issues getting a static DHCP lease and if the MAC is correct.