My #HomeLab

Preamble

The time has finally come for an inaugural VP blog post, and what better way to kick this off than covering the #homelab? This quintessential tool has been the topic of many conversations (go checkout thevpad.com if that isn’t already obvious) and is, in my opinion, table stakes to a successful career in this industry. I started out with a humble 2 node ROBO vSAN cluster on whitebox chassis. Growing pains led me to what I have today, but rest assured that everyone has a starting point and there’s no shame in that fact. Consider yourself “ahead of the crowd” just by taking the first step; initiative and persistence are key to mastering this industry. Some believe that this tool should be subsidized; I fundamentally disagree. Your career requires constant maintenance and investment in order to provide fruits of your labor. If you have arrived at this post, I assume you concur.

Enough of the opinion column, let’s jump into the real reason why you are.

Overview

Covering the #homelab build in one post isn’t going to cut it. We’ll go over the high level design in terms of Compute, Network and Storage components. I’ll also go into 3rd party integrations, but expect deep dives into all of the above as future posts warrant the context.

Core Functions

  1. Support my day job – Technology is my hobby, but I do have a day job. A production-like sandbox keeps me up-to-date on all things VMware
  2. Family services – Collaboration and productivity services for my family are hosted in the compute stack
  3. Educational Tool – The best part of the lab is using it to show others what they can build for themselves

Core Characteristics

  1. Stability and reliability – anyone who submits maintenance windows to your spouse understands why this is important
  2. Speed and flexibility – Ample bandwidth and the ability to “reset” minimize prep time for exercises
  3. Efficiency and environment – Co-op power keeps the light bill low, and an insulated basement keeps the noise to whisper levels

Hardware Components

  1. Dell R720 Dual E5-2670 256GB DDR3 2.1TB Capacity Tier (3x)
  2. HPE 3500yl-24G-PoE Layer 3 Switch
  3. HPE 3500yl-48G-PoE Layer 3 Switch
  4. Brocade VDX 6720-24 Layer 3 10GbE Switch
  5. WD EX4100 NAS 10TB Capacity Tier
  6. Netgate SG-2440 Firewall Appliance
  7. APC SUA2200RM2U UPS w/ NMC
  8. APC AP7900 Switched 1U PDU

Details

Over the next few subsections, I’ll offer technical details, guidance and lessons learned as I constructed the #homelab v2. You’ll find that the common theme across each facet of technology is VMware. That is not by accident, as one of the many perks of employment at VMware is unfettered access to the breadth of technology it offers. YMMV, but for those on the “buy-side” or acting in partner roles, I strongly encourage you to join your local VMUG to take advantage of similar benefits.

The plan was to run a consolidated Compute/Management/Edge cluster with extensibility in the plumbing for future expansion. The technical requirements to achieve such an architecture were:

  1. Distributed Storage – This could have come in any iSCSI/NFS NAS or FCoE flavor, but I wasn’t too keen on running a dedicated storage host. This is a requirement I struggled with considerably, and it’s a topic I continue to revisit from time to time. However, vSAN seemed to be the path of least resistance to satisfying the collapsed storage needs. This is what led me to an N=3 equation for the server purchase, although I quickly began considering a 4th host to support FTT=1 with Rapid Rebuild.
  2. Logical Networking – I despise managing VLANs, plain and simple. You’ll find over time that I’m very sensitive to security architecture, and painfully choose to run my lab in an ecosystem that arguably rivals some of the most highly regulated industries. Put plainly, I have trust issues. NSX was an absolute must for both micro-segmentation and overlay networking, and an underlay that would support jumbo frames would be in order. One authenticated overlay and one unauthenticated DMZ would need to be established, so separate physical uplinks would need to be considered.
  3. Massive Compute – This requirement is actually straight forward. I’m constantly running SQL performance tests for various work-related activities, so the ability to push bits at alarming rates was a must. Plus, the cost:core ratio was negligible, so why not go up to the 2670s? Keep in mind that due to Spectre, you’ll need to plan capacity with care.

Compute

The horsepower of the lab consists of 3 refurbished Dell R720s, acquired from the good people at Orange Computers. I’ve been buying most of my server gear from Orange, and I’m very happy with the product, customer service, and customizable options they offer. They also have an eBay store that constantly runs great discounts, so start creating your watchlists…

Each server was ~$1350, which included the following outfit:

  1. Processor – Dual Intel Xeon E5-2670 (SandyBridge)
  2. RAM – 256GB ECC DDR3 (16x 16GB DIMMs)
  3. NIC – Dell Broadcom 57800 (Dual 10GbE SFP+, Dual 1Gb RJ-45)
  4. Storage Controller – Dell PERC H710 512MB
  5. OOBM Interface – Dedicated iDRAC 7 w/ Enterprise License
  6. Hard Drive – 8x Dell 300GB SAS 10K 2.5″ 6G
  7. Power Supply – Dual 750W Dell Hot Swap Power Supplies

Processor options are paramount when building your #homelab. I was adamant about supporting vSphere 6.7, which deprecated support for the Westmere (11g) sockets and below. There are some documented workarounds, but this is absolutely not supported. Before you make a significant investment, check the release notes and HCL.

Based on the plan above, additional components would need to be acquired; PCIe NICs to achieve hardware segment isolation, as well as enterprise SSDs to establish a vSAN caching tier. Here is the BOM that wasn’t included in the server purchase:

  1. SSD – 3x STEC Zeus 800GB 2.5″ SAS ($138:drive)
  2. NIC – 3x Intel I350 PCIe 4 Port 1Gb RJ-45 ($60:NIC)
  3. Install Media – 3x SanDisk 32GB micro USB ($6:USB)

There are other considerations such as ReadyRails, Bezels, etc that will be case-by-case. If you wold like additional details on items such as these, reach out direct. Also consider over-purchasing by a low percentage. Spare parts are fantastic to have on hand for when things go wrong, such as disks, fan baskets, etc.

Net result of the build above is 48 virtual processors, 768GB of ECC RAM and 5.7TB of vSAN capacity across three hosts. Plenty o’ power for a #homelab. Each host idles around 168 watts, which on co-op power produces a run rate of around ~$45/month to keep everything “on”. Heat and noise are both negligible since the rack runs in a climate controlled basement. ESXi 6.7u1 is deployed on each host and configured in one vCenter cluster with vSAN, fully automated DRS and vSphere HA capabilities. Future posts will cover the how-to for most of these functions.

Networking

The networking backbone for the #homelab is split into three underlays and one global overlay. Most of the supporting hardware was acquired refurbished on eBay and provides a tremendous amount of enterprise capability across various networking protocols. Beyond the requirements mentioned above, I place emphasis on unauthenticated vs. authenticated topologies. This is largely irrelevant in the days of zero trust models, but a core tenet for the lab is that traffic sourced from unauthenticated zones doesn’t mix with authenticated traffic. This is the primary reason why you will find many uplinks within the fabric. Also, to comply with the above characteristics, each uplink contains redundancy in case of link failure.

Working from the internet down to the underlay, here is the BOM to meet my requirements:

  1. Netgate SG-2440 ($299) – This was purchased new, as I actively contribute to the pfSense community. It serves as a fantastic edge firewall/router behind AT&T GigaFiber, and facilitates all core routing between my primary and tertiary sites.
  2. HPE 3500yl-48G-PoE ($249) – This is the core switch for the primary site underlay and services the rack. It is peered with the Netgate appliance and the distribution switch over iBGP. PoE wasn’t important here, but a nice to have for the price.
  3. HPE 3500yl-24G-PoE ($139) – This is the distribution switch for the site underlay servicing the WLAN and surveillance infrastructure (details in a future post). It peers with the core switch above over eBGP, and PoE was a must to power the access points and cameras. Great value!
  4. Brocade VDX 6720-24 ($190) – This is the SAN switch to support 10Gb fiber links for the vSAN network, the third underlay. This is not absolutely required for a hybrid vSAN deployment, but it certainly makes a positive difference in performance. Each of the 24 ports were fully licensed as a part of the price.

The 3500s are power hungry (unlike the VDX), but they’re great value purchases given the vast array of enterprise capability and overall stability. They also fall under lifetime support licensing, as the firmware has been migrated to something similar to the ArubaOS framework. This was key for me since the 48G is handling unauthenticated VLANs. Keep in mind when buying enterprise equipment, you need to confirm whether or not the gear requires licensing (the VDX, for example). Most resellers will include this as a part of the sale if that is the case. Lastly, fiber gear is sensitive to compatibility; it is critical that when purchasing SFP modules, cables and other add-ons/components, you confirm with the appropriate vendor HCL for full-feature support.

The physical underlay is structured into 3 parts: core connectivity, data center services and site services. Many single-purposed zones reside within each compartment and cross-zone communication is enabled with layer 3 ACLs. Each of these zones can be generally described as:

  1. Infrastructure – the rack and all of its appliances (black)
  2. Management – software O/S just North of bare-metal (yellow)
  3. Monitoring – port mirrored DPI, syslog, etc. (yellow)
  4. Storage – iSCSI, NFS and vSAN (aqua)
  5. Transport – the underlay hosting NSX (green)
  6. Uplink – on-ramp/off-ramp to overlays and distribution switches (red)
  7. Net Services – Infrastructure RADIUS, DNS, NTP, TFTP, etc (not shown)

On the site services side:

  1. Management – hosting Ubiquiti access points and access switches
  2. Smart Devices – any and all things “smart” in the home IoT
  3. Surveillance – hosting Ubiquiti security cameras
  4. Managed Users – devices that have enrolled into MDM
  5. Guest Users – devices that are unmanaged, mostly my in-laws

Lastly, it’s worth mentioning that the overlay architecture becomes far more functional and tuned to the resident workloads compared to the above. There will be much more on the logical network in a future post, as a brief mention here doesn’t remotely do the topic justice.

Storage

This is admittedly an area that I’m not as skilled as others, and certainly not a focus of this blog. For grade A blogging on all things VMware Storage, head over to Duncan Epping, Cormac Hogan and Chris Colotti.

For the lab, shared storage was a “must” to properly architect the cluster. Distributed storage made sense given I was reluctant to run a dedicated array over any of the mainstream protocols, but I was hyper-focused on FTT and reasonable IOPS. Insert vSAN!

Each host has the following disk group configuration:

  1. Cache Tier – 1x 800GB STEC Zeus SAS SSD
  2. Capacity Tier – 7x 300GB Dell 10K SAS HDD

Total vSAN capacity yields 5.72TB. I haven’t run a proper performance benchmark, but haven’t noticed any performance issues with day-to-day functions. vSphere 6.7 makes setting up and maintaining vSAN clusters a breeze, and I’m able to full clone templates in under 3 minutes. A few comments around the consideration of implementing vSAN in your lab:

  1. HCL is key – I can’t stress this enough. Unsupported firmware, controllers, disks, etc will destroy performance
  2. RAID controllers must be set to passthrough – the hosts above contain a PERC H710, and I set each disk in the controller to an individual RAID 0 to pass the disk through to the kernel. Do not present a RAID array to vSAN
  3. Fabric Bandwidth is Case-by-Case – Each of the hosts are on a 10Gb fabric for vMotion, vSAN and other storage protocols. YMMV here, as I was originally successful running hybrid vSAN on a 1Gb network. After upgrading to fiber, I did notice a measurable improvement. For all-flash configurations, 10Gb is an absolute requirement. 2 Node ROBO can deploy 10Gb inexpensively by using SFP DAC cables.

Third Party Services

Some notable mentions that provide critical lab services:

  1. AWS EC2, Route53 and S3 – Top-level DNS, backup object storage, and compute to power this blog
  2. Azure AD and Office365 – This is one of the largest parts of my day job, and provides collaboration/productivity to the lab
  3. Okta – The identity source of truth for all things, including AD
  4. GitLab – All of the lab task backlog is documented here and put into stories
  5. F5 – LTM and GTM are deployed for load balancing, although the not-so-far-in-the-future plan is to collapse this back into NSX

Future Plans

As mentioned earlier, expanding to a fourth host is first and foremost, as running a resilient vSAN environment that supports rolling maintenance mode should really have a standby host. Host patching in a 3 node cluster leaves you exposed to critical data loss in the event of a subsequent failure. There is also a plan to upgrade the environment to 6.7U2; however, due to a known bug with the F5 VE, I’ve postponed said activity. Lastly, I’ve upgraded the NSX-T install-base from 2.3 to 2.4, but sorely disappointed that none of the NSX Manager UI components migrated to the new Simplified UI. It’s on the agenda to manually recreate the environment in the Corfu structure to take advantage of the new features in 2.4.

Tags: ,