Lazarus LabsLazarus Labs
Infrastructure & Systems Engineer · Philadelphia / Remote

Christopher Rothmeier

Operations-first infrastructure engineer

Thirteen years in enterprise IT — from Windows, Active Directory, and VMware estates with strict uptime requirements to a self-hosted, high-availability Kubernetes and GPU datacenter. I operate and improve production infrastructure with an emphasis on reliability, recovery, observability, and change safety. This site is my technical portfolio and lab notebook.

Open to full-time W-2 infrastructure / systems engineering roles — Philadelphia area or remote. Not available for contract or freelance work.

Core Competencies

What I work on

Microsoft & Identity

Active Directory, Entra ID, Azure AD Connect, Conditional Access, Exchange Online, SharePoint / OneDrive, Intune.

Kubernetes & Automation

High-availability K3s with kube-vip, Terraform / Ansible provisioning, GitOps, and a drift-controlled, repeatable approach to infrastructure.

Endpoint Security & Compliance

Intune baselines, CrowdStrike, Defender for Endpoint, CIS benchmarks, and audit-friendly controls including HIPAA contexts.

Data Protection & Recovery

Veeam, Proxmox Backup Server, restore validation, RPO / RTO planning, and operational runbooks.

Virtualization & Storage

VMware vSphere / ESXi, Proxmox / KVM, GPU passthrough, ZFS-backed storage, and backup / recovery patterns.

GPU & Compute Infrastructure

A heterogeneous multi-GPU NVIDIA fleet for local inference and retrieval — owning the platform layer: scheduling, monitoring, and data-layer durability.

Experience

Enterprise depth, hands-on lab validation

Enterprise operations

  • Managed Windows / VMware estates for finance and trading environments with strict uptime requirements.
  • Planned and executed hybrid Azure migrations — Azure AD Connect, Conditional Access, M365 tenant configuration.
  • Implemented endpoint-security baselines, backup / recovery strategies, and compliance controls.
  • Strong change control, incident response, and stakeholder communication across varied team sizes.

Self-hosted datacenter

  • Production-grade K3s cluster for validating high-availability and recovery patterns — not a toy setup.
  • Recovered a cluster-wide P1 boot incident with zero data loss; documented root cause and follow-up controls.
  • Cluster-independent, restore-validated backups; ZFS storage; 10/25GbE networking; GPU passthrough for inference.
  • Full observability stack: Prometheus, Grafana, alerting wired to real failure modes.
Lab Notes

Recent writing from the homelab

Standing up a 70B inference node
Featured · May 2026

Standing Up a 70B Inference Node — and Deciding Not to Ship It

Refitting a single 48 GB Blackwell node for 70B-class inference: the procurement pivot, a silent runtime trap that quietly halved throughput, sustained-load measurements, and the discipline of building a capability and then not promoting it.

Read the lab note →
All lab notes →

Let's talk

I'm looking for full-time W-2 infrastructure, systems, or platform engineering roles in the Philadelphia area or remote. The fastest way to reach me is LinkedIn or email.