Open main menu

Linux and Unix Users Group at Virginia Teck Wiki β

Changes

Infrastructure:Sysadmin Handbook

7,199 bytes added, 17 February
no edit summary
This page describes how to manage the infra. See [https://vtluug.github.io/rtfm.txt rtfm.txt] for a guide to build the infrastructure it from scratch, as well manage it in general.
This covers setup of a VM on [[Infrastructure:Meltdown|meltdown]] or [[Infrastructure:Spectre|spectre]] depending on if the service is critical or not.
== Networking ==* Set Infodump (i will clean this up physical boxes based on the [[Infrastructure:Diagram|Diagram]]* Determine the ip addresses based on [[Infrastructure:Network|Network]]=== Router =later, promise) ==Configure /etc/network/interfaces:
<nowiki>LUUG infrastructure runs on, essentially, four key components:# v6* Hostsiface $EXTERNAL_IF inet6 auto* NFSiface $INTERNAL_IF inet6 static* Auth address $INTERNAL_IPv6and netmask 128 # Enable internal network to access router's external v6 address pre-up ip route add $EXTERNAL_IPv6 via $INTERNAL_IPv6 # Enable NDP Proxying so internal boxes get SLAAC pre* out-up echo 1 > /proc/sys/net/ipv6/conf/all/forwarding preof-up echo 2 > /proc/sys/net/ipv6/conf/all/accept_raband Ansible & Docker manifests
# VTLUUG Private Network v4iface $INTERNAL_IF inet static address $INTERNAL_IPv4 netmask 255Almost all of our services are hosted in Docker containers across various hosts: on [[Infrastructure:Gibson]], the LLM server, on [[Infrastructure:Sczi]] the web content.255100% of these docker containers have their configuration detailed [https://github.255com/vtluug/docker-manifests here].0
# Additional IPsiface $EXTERNAL_IF inet static address $EXTERNAL_IPv4 gateway 128.173.88.1 broadcast 128.173.91.255 netmask 255.255.252.0 # Nat Settings # TODO this probably doesn't work pre-up tc action nat egress 10.99.0.0/24 $EXTERNAL_IP # Enable ARP Proxying so internal v4 address are accessible pre-up echo 1 > /proc/sysThe entire repository is cloned to /netnfs/ipv4cistern/confdocker/all/proxy_arp preapps, and the docker-compose.yml files for each service are ran with the command 'docker compose up echo 1 > /proc/sys/net/ipv4/ip_forward # Route internal v4 addresses ip route add $JOEY_EXTERNAL_IPv4/24 dev $INTERNAL_IF ip route add $CRASHANDBURN_EXTERNAL_IPv4/24 dev $INTERNAL_IF ip route add $SCZI_EXTERNAL_IPv4/24 dev $INTERNAL_IF ip route add $ACIDBURN_EXTERNAL_IPv4/24 dev $INTERNAL_IF ip route add $ZEROCOOL_EXTERNAL_IPv4/24 dev $INTERNAL_IF ip route add $MIRROR_EXTERNAL_IPv4/24 dev $INTERNAL_IF</nowiki>-d' while in the service folder.
NextNote the path: /nfs/cistern/docker/apps. Looking at the docker-compose folders & configs, set up NDP proxyingConfigure you will notice that the *data* for the container is **never** stored alongside the compose files themselves. Instead, they are stored at /nfs/cistern/docker/data/<insert-service-name>/<etc/ndppd>.conf: This is an NFS (May not already existNetwork File System)mounted path: it exists physically on our NFS server, [[Infrastructure:Dirtycow]] and is mounted over the local network.
The implications of this should be clear: *the host install does not actually matter*. If the operating system for e.g. [[Infrastructure:Sczi]] blew up, all one would need to do to bring everything back up is re-create it, install docker, mount the cistern NFS directory (with the date files still intact), set up auth, and start all the containers again. No data is ever lost, because nothing is stored on the host itself: it's all on the NFS share.
How do you easily set all that stuff back up again? [https://github.com/vtluug/ansible Ansible]. you can think of ansible as a language designed for defining deployed servers. It uses YAML (.yml), and "roles" are specified for each server. in roles/<nowikiserver role># Rather than only listenting on each individual IPv6 address/tasks, there exists a list of things needed to set up the server, we # simply forward and in /hosts.cfg there exists a list of servers and which roles they all soliciationshave. The main advantage is that we# don't have All you need to do to add any additional routing rules if set a new internal# device server up is added.routerun ansible -ttl 30000address-ttl 30000it will take care of the rest. You can run it twice, or a million times, to no ill effect: it's designed to be idempotent.
# External interface Knowing this much, you can re-create [[Infrastructure:Sczi]] and [[Infrastructure:Gibson]], but there are a few remaining things: VM hosts ([[Infrastructure:Meltdown]], [[Infrastructure:Spectre]]), and the router ([[Infrastructure:Shellshock]]). Deploying the router is described in [https://vtluug.org/rtfm.txt rtfm.txt], but VM deployment is entirely automated via ansible, which is *sick as fuck*. It only works for ubuntu server and redhat enterprise (alma, rocky, centos) distros, but for those it works brilliantly -- add a VM to listen onproxy $EXTERNAL_IF { router yes timeout 500 autowire no keepalive yes retries 3 promiscuous no ttl 30000[https://github.com/vtluug/ansible/blob/master/roles/deploy-vms/defaults/main.yml this file] and run the ansible playbook -- the new VM will automagically create.
# Prefix Web traffic!We run DNS through Gandi. Ask an officer to listen add you to the VTLUUG org on rule :that website ([[User:0/ { # TODO might change prefixRsk]] has access, if you're reading this in the far future and need it). # Internal interface Each host gets a direct A record pointing at it's IP address, and web content *all* points to forward everything to iface $INTERNAL_IF autovia no }}</nowiki>[[Infrastructure:Sczi]] via CNAME records. Sczi's docker config has an nginx container that handles certificates and reverse proxying.
Now start '''Acidburn is our singular "traditionally managed" server. It runs many services, mail among them, andall are running as services on the VM itself, not a container in sight (sans the IRC <-> Matrix bridge, which is there for IP whitelisting reasons. You can redeploy it from ansible, but it won''' enable ndppdt have the same soul.serviceTry not to break it.
=== Everything Else not AuthWe run under oVirt ======= Debian ====Configure /etc/two Authentication servers, [[Infrastructure:Chimera]] and [[Infrastructure:Sphinx]]. They're both on the same FreeIPA network/interfaces:and can be deployed via ansible.
<nowiki># v6iface $INTERFACE inet6 autoauto $INTERFACEiface $INTERFACE inet static address $INTERNAL_IPv4 gateway 10FreeIPA is a full-stack authentication provider.99Part of our ansible playbook for LUUG hosts runs ipa-client-install, which sets up the hosts as "clients" to this FreeIPA network, and allows users with FreeIPA accounts to log in via ssh, reflecting usergroups over on to the system.0.1 netmask 255.255.255.0
# Additional IPs [[Infrastructure:Spectre]] notably is *not* a FreeIPA client, because it's intended for use by non- Only do this if this box has an external IPiface $INTERFACE inet static address $EXTERNAL_IPv4 gateway 128LUUG entities (whether that be personal member VMs or ones loaned out to other student orgs).173.88.1 netmask 255.255.252.0</nowiki>
==== Centos ====Configure The root account password is in the [https://etcgit.vtluug.org/sysconfigofficers/networkvtluug-scripts/ifcfgadmin vtluug-$INTERFACE:admin] private repository. Ask someone to be added to the officers group.
<nowiki>
ONBOOT="yes"
NM_CONTROLLED="no"
BOOTPROTO="static"
IPADDR0="$INTERNAL_IPv4"
GATEWAY0="10.99.0.1"
NETMASK0="255.255.255.0"
# Addition IPs - Only do this if this box has an external IP
IPADDR1="$EXTERNAL_IPv4"
GATEWAY1="128.173.88.1"
NETMASK1="255.255.252.0"
</nowiki>
== Other stuff Networks ==''Further information: [[Infrastructure:Network|Network]] We ''should'' have the following networks in place: * [[Infrastructure:Meltdown|meltdown]] and [[Infrastructure:Spectre|spectre]] br0 on eno1 <--> enp4s0 on [[Infrastructure:Joey|joey]]. This is the main LUUG network.** 10.98.0.0/16 for VTLUUG NAT** IPv6 via prefix delegation on 607:b400:6:cc80/64** Global IPv4s via ARP proxying (See https://github.com/vtluug/scripts). Gateway is 128.173.88.1/22.* Static hosts are on 10.98.0.0/24, and DHCP is enabled on 10.98.1.0/24. This is mainly just useful for organization and quickly finding new hosts or other hosts on the network.** Static host IPs are assigned via static DHCP leases for IPv4.** Since we can't do this with IPv6, physical host IPs are determined upon first boot and VMs are assigned a specific MAC to pre-determine the SLAAC IP.* "Internet" (a CNS portal) <--> enp2s0 on [[Infrastructure:Joey|joey]]. LUUG only has one of these, and port security is probably enabled. '''DNS/DHCP:'''* All DNS entries for services run by VTLUUG are hosted on [https://gandi.net Gandi]. Ask an officer if you want to change something.* jkh and Roddy own ece.vt.edu. DNS updates don't happen. echarlie can add IPv6-only records if needed to wuvt.vt.edu so we have PTRs.* [[Infrastructure:Joey|joey]] runs DHCP via dnsmasq on enp4s0 (that is, 10.98.0.0/16). To change anything, modify it on https://github.com/vtluug/scripts first then pull that into root's homedir on [[Infrastructure:Joey|joey]]. Please don't just update it on a machine without pushing your updates.* By default, hosts are accessible via SSH on ports 22 and 2222. == Adding a VTLUUG Service VM ==''VMs in this category are deployed to [[Infrastructure:Meltdown|meltdown]]'' Prerequisites:* Clone <code>https://github.com/vtluug/scripts</code>. This is referred to as 'SCRIPTS' in this guide.* Clone <code>https://github.com/vtluug/ansible</code> and install ansible. This repo is referred to as 'ANSIBLE' in this guide.* Have access to [https://git.vtluug.org/officers/vtluug-admin officers/vtluug-admin] on [https://git.vtluug.org gitea].* Understand the [[Infrastructure:Network|Network]] and [[Infrastructure]].* Put your SSH key on [[Infrastructure:Meltdown|meltdown]] === Configure the network ===* Decide on a MAC address for the host and add it to <code>SCRIPTS/router/lan/local_hosts</code>* Add an entry to <code>SCRIPTS/router/lan/dnsmasq.conf</code> for static DHCP leases.* If a new IP in 128.173.88.1/22 is being added, also add it to <code>SCRIPTS/router/proxy/arp_proxy.sh</code> '''Note:''' It is '''not''' recommended that you do the following steps if nobody is on campus in case something breaks. Pull the latest changes to <code>/root/scripts</code>, update the configuration files, and restart the services:* Dnsmasq configuration is at <code>/etc/dnsmasq.conf</code>* ARP Proxy configuration is in <code>/usr/local/bin</code> === Add the VM configuration to ansible ===Edit <code>ANSIBLE_PATH/roles/deploy-vms/defaults/main.yml</code> and add a new entry, following the existing format. '''Note:''' if there are any entries in this file that are '''not''' present on the VM host, they will also be created. Comment out entries that shouldn't be created. Existing hosts are skipped. Run <code>ansible-playbook -v deploy.yml -i hosts.cfg -u papatux -k -K -e @VTLUUG_ADMIN_REPO/accounts.yml</code>, using the correct vtluug-admin repo path. === Testing ===The new host should be accessible by papatux on via SSH port 2222 (and 22) over IPv6 and IPv4 from the internal network. Check 10.98.1.0/24 to see if it had any issues getting a static DHCP lease and if the MAC is correct. == Adding a User VM ==''VMs in this category are deployed to [[Infrastructure:Spectre|spectre]]'' Prerequisites:* Clone <code>https://github.com/vtluug/scripts</code>. This is referred to as 'SCRIPTS' in this guide.* Clone <code>https://github.com/vtluug/ansible</code> and install ansible. This repo is referred to as 'ANSIBLE' in this guide.* Understand the [[Infrastructure:Network|Network]] and [[Infrastructure]].* Have root on [[Infrastructure:Spectre|spectre]]* Put your SSH key on [[Infrastructure:Spectre|spectre]] === Configure the network ===* Decide on a MAC address for the host and add it to <code>SCRIPTS/router/lan/local_hosts</code>* Add an entry to <code>SCRIPTS/router/lan/dnsmasq.conf</code> for static DHCP leases. (If applicable; you might not care for a test/temp VM). '''Note:''' It is '''not''' recommended that you do the following steps if nobody is on campus in case something breaks. Pull the latest changes to <code>/root/scripts</code>, update the configuration files, and restart the services:* Dnsmasq configuration is at <code>/etc/dnsmasq.conf</code> === Add the VM configuration to ansible ===Edit <code>ANSIBLE_PATH/roles/deploy-vms/defaults/main.yml</code> and add a new entry under <code>new_vms_spectre</code>, following the existing format. '''Note:''' if there are any entries in this file that are '''not''' present on the VM host, they will also be created. Comment out entries that shouldn't be created. Existing hosts are skipped. Run <code>ansible-playbook -v deploy.yml -i hosts.cfg -u papatux -k -K</code>, using the correct vtluug-admin repo path. '''Important:''' A random root password is set during VM creation and printed to stdout. Record this! === Testing ===The new host should be accessible by root on via SSH port 2222 (and 22) over IPv6 and IPv4 from the internal network. Check 10.98.1.0/24 to see if it had any issues getting a static DHCP lease and if the MAC is correct. [[Category:Infrastructure]][[Category:Howtos]][[Category:Needs restoration]]
214
edits