Infrastructure & Systems Engineer · Philadelphia / Remote

Christopher Rothmeier

Operations-first infrastructure engineer

Thirteen years in enterprise IT — from Windows, Active Directory, and VMware estates with strict uptime requirements to a self-hosted, high-availability Kubernetes and GPU datacenter. I operate and improve production infrastructure with an emphasis on reliability, recovery, observability, and change safety. This site is my technical portfolio and lab notebook.

Read the Lab Notes Connect on LinkedIn

Open to full-time W-2 infrastructure / systems engineering roles — Philadelphia area or remote. Not available for contract or freelance work.

Core Competencies

What I work on

Microsoft & Identity

Active Directory, Entra ID, Azure AD Connect, Conditional Access, Exchange Online, SharePoint / OneDrive, Intune.

Kubernetes & Automation

High-availability K3s with kube-vip, Terraform / Ansible provisioning, GitOps, and a drift-controlled, repeatable approach to infrastructure.

Endpoint Security & Compliance

Intune baselines, CrowdStrike, Defender for Endpoint, CIS benchmarks, and audit-friendly controls including HIPAA contexts.

Data Protection & Recovery

Veeam, Proxmox Backup Server, restore validation, RPO / RTO planning, and operational runbooks.

Virtualization & Storage

VMware vSphere / ESXi, Proxmox / KVM, GPU passthrough, ZFS-backed storage, and backup / recovery patterns.

GPU & Compute Infrastructure

A heterogeneous multi-GPU NVIDIA fleet for local inference and retrieval — owning the platform layer: scheduling, monitoring, and data-layer durability.

Experience

Enterprise depth, hands-on lab validation

Enterprise operations

Managed Windows / VMware estates for finance and trading environments with strict uptime requirements.
Planned and executed hybrid Azure migrations — Azure AD Connect, Conditional Access, M365 tenant configuration.
Implemented endpoint-security baselines, backup / recovery strategies, and compliance controls.
Strong change control, incident response, and stakeholder communication across varied team sizes.

Self-hosted datacenter

Production-grade K3s cluster for validating high-availability and recovery patterns — not a toy setup.
Recovered a cluster-wide P1 boot incident with zero data loss; documented root cause and follow-up controls.
Cluster-independent, restore-validated backups; ZFS storage; 10/25GbE networking; GPU passthrough for inference.
Full observability stack: Prometheus, Grafana, alerting wired to real failure modes.

Lab Notes

Recent writing from the homelab

Featured · May 2026

Standing Up a 70B Inference Node — and Deciding Not to Ship It

Refitting a single 48 GB Blackwell node for 70B-class inference: the procurement pivot, a silent runtime trap that quietly halved throughput, sustained-load measurements, and the discipline of building a capability and then not promoting it.

Read the lab note →

Capacity Planning May 2026

What Quantization Actually Saves

Two controlled benchmarks settle it: storage precision buys disk and speed; only compute precision buys serving VRAM.

Read →

Incident Response May 2026

A Backup That Reported Success — and Wasn't

The recorded root cause didn't survive contact with the live system. Fail-closed hardening and real alerting, with no data lost.

Read →

Migration May 2025

The Great Escape: VMware to Proxmox/KVM

Migration planning for virtualization estates after Broadcom's acquisition — including GPU-aware workloads.

Read →

All lab notes →

Let's talk

I'm looking for full-time W-2 infrastructure, systems, or platform engineering roles in the Philadelphia area or remote. The fastest way to reach me is LinkedIn or email.

LinkedIn Profile crothmeier@lazarus-labs.com