1. President's Corner: Lessons Learned
I have been thinking a lot lately about a series of major system outages this year in my datacenter at the school district where I work.  They are all related to the switching infrastructure between my ESXi cluster and my SAN. What could I have done to avoid the outages and what lessons can I learn from the outages?

My set up is a cluster of 4 switches of which 2 are redundant (1GB), while the other 2 are not (1 fiber and 1 10GB).  When connected together, they become one switching fabric.  The fiber switch is connected to storage and the 10GB copper switch to the ESXi servers.  In summary, over the past four months I have had four incidents that crashed the datacenter as the servers had no access to their storage:

Incident #1: Cable installers cut power to fiber switch
Incident #2: Switch VLAN reconfiguration causes switches to go offline
Incident #3: Fiber switch completed failed
Incident #4: Adding in replacement fiber switch caused SAN to have problems

Lessons learned from all of these incidents are:

  Redundancy is necessary in a modern datacenter.  Incidents #1 and #3 could have been avoided if we had redundant fiber switches.  Our previous switches from this vendor never had any downtime in over 4 years so we were lulled into a false sense of safety.
 Make small changes and test them first.  For Incident #2 we thought that the new VLAN configuration would work fine so we pushed it out to all ports on that non-redundant fiber switch.  This summer when we try again, we are going to test it on a few of the redundant 1GB ports with a test ESXi server.  Once the new configuration is fully tested we will push it out to 1 of the fiber switches and then if it works to the redundant fiber switch.
 Make sure you have 7x24 support on all parts of your infrastructure. During Incident #4 we found out that while we had 7x24 support on the ESXi servers and the switches, we only had 5x9 support for the SAN.  So we had to wait overnight before we could verify the SAN was the problem (the symptom was iSCSI sessions being dropped).
 Just because something has failed in the past doesn't mean it failed again and if something has not failed before there is always a first time.  For incident #4 we were sure it was a switch problem again as that is what failed the previous 3 times.  It took a hours before we were really looking at the SAN which had never failed before as the possible cause of the failure.
 Make sure 7x24 actually means that there is someone who is available 7x24. For the switch vendor, when we called their support no one would pick up. This happened twice to us during Incident #4 and we are now discussing with them what happened and why.
 Rebuilding your datacenter while in production is asking for trouble.  It is just too easy for the installers, etc. to bump a power cord, patch cable, etc. and cause a problem.  I am actually a bit surprised we did not have more issues because of this.
 A supportive team is very helpful.  Many times as system admins we are responsible for systems that fail due to things out of our control.  In these cases, it is really nice to have a team around you that helps out to get services running without pointing fingers, but rather looking for ways to speed the recovery process and then figure out what could have been done to avoid the failure.
 That said, spend a few hours and document the risks you see in your datacenter so your management is aware of the risks.  This makes it a lot easier when a failure does happen as they will not feel totally blindsided by the failure.
I would love to hear from you about some of your lessons learned.  Email me at
ski@lopsa.org and I will put the best ones into next months column.

 2. LOPSA Board Elections - Vote now!
The Leadership Committee is pleased to announce that they have opened the 2016 LOPSA board member elections.

Before voting, please take a moment to review the first[1] and second[2] LOPSALive sessions, where the candidates answered LOPSA member questions. Finally, review all the candidate statements[3].

 Brian Globerman - Candidate Statement
George Beech (Incumbent) - Candidate Statement
Scott Suehle - Candidate Statement
Steven VanDevender (Incumbent) - Candidate Statement
Thomas Uphill (Incumbent) - Candidate Statement
Trevor Thorpe - Candidate Statement
Vote[4] before the LC closes the polls on June 23 at 12:00AM Eastern.

[1] https://lopsa.org/blog/4025835
[2] https://lopsa.org/blog/4041805
[3] https://lopsa.org/blog/4014528
[4] https://election.lopsa.org/lopsavote/

4. Member Tech Blog Highlight - Vertical Sysadmin
Member Mario Obejas provides a guest post on member Aleksey Tsalolikhin's blog Vertical Sysadmin on “a replacement for bash?” and on writing production-grade code in any language.


Aleksey Tsalolikhin recently presented to educators from the California community college network at the Digital Media Educators Conference (http://ict-dm.net/tracks-2016/data-representation/item/infrastructure-administration-server-management-at-scale) on June 10th on what is a sysadmin and how to make one, with examples of working at scale from the world of CFEngine.  Many thanks to Ski Kacoroski for the presentation slides.

Vertical Sysadmin is offering a 15% discount to LOPSA members on professional Git training http://www.verticalsysadmin.com/git/.   "I have got by on minimum understanding of git for a couple of years now--this really brought up my confidence. After I learned the git internals, the esoteric commands really started falling into place."
-- Nicholas Santucci, Systems Administrator

Do you have a technical blog you'd like featured in the LOPSAgram? Email: board@lopsa.org

Don't have a blog yet? You can always use your LOPSA blog at https://blogs.lopsa.org. Choose sysadmin-news as the category if it is recent news item for the https://lopsa.org front page.

6. LISA Conversations - Unleashing the Power of the Unikernel
Every month Lee Damon and Tom Limoncelli chat with a LISA conference presenter in LISA Conversations. In June they will be chatting with Russell Pavlicek about Unleashing the Power of the Unikernel.

The conversation will be live at:


on Tuesday, 28 June at 3:30pm PDT, 6:30PM EDT.

Past LISA Conversations can be found at:


 7. Locals

 SASAG: Seattle Area System Administrators Guild

 In June, new officers were elected:

 President: Thomas Uphill
 Vice-President: Brian Globermann
 Secretary: Curtis Elgin
In July we'll hear from Brian Globermann on " Implementing Zabbix in Azure for network and server monitoring"

 Dinner will be sponsored by Silicon Mechanics.

 CBUS: Columbus

 LOPSA Columbus met last on May 19 at CoverMyMeds. Rob Kinyon gave a presentation titled "Devs are from Mars, Ops are from Venus" where he empathized with and shared the perspectives of different disciplines of technology professionals. The next meeting will be on June 21 at 6:00PM. RSVP here: https://lopsacbus201606.eventbrite.com

LOPSA-NJ: New Jersey

 The June Talk was given by Sujit Pal on IoT.

 History and Future of IoT (Commercial and Research)
How is IoT going to impact us
 Some Example Applications of IoT
 Components of IoT infrastructure
 Challanges to be overcome for IoT go mainstream
 Some sample solutions
 There were 25 people present and Suji's talked sparked discussions on how the use of RFID chips has expanded as the devices shrunk in size. LOPSA-NJ will be on their Summer Break but Joe Youn and Mike Stoppay are the new Organizers for the group and will be working with help from John Boris of the Board of Directors and LOPSA-NJ member to learn the ropes on managing the group. The next meetup will be September 1st at the Mercer County Library Branch in Lawrenceville, NJ. They are actively looking for a speaker for September and October. If interested you can contact John Boris at jboris@lopsa.org.

LOPSA-LA / UUASC: Los Angeles

 Thirty people attended the LOPSA-LA meeting on Thurs June 2, where Mike Weilgart presented on Git Basics.

 Mike Weilgart is going to repeat his popular "Taming the Git Filesystem" talk at LOPSA LA meeting in Burbank CA at Coding Dojo on Thursday, June 23rd, 7-9 PM.  RSVP  http://www.meetup.com/lopsala/events/23171720

 Keep with the LOPSA-LA community and upcoming meetings via the email list and website: http://www.lopsala.org/

