1. President's Corner - Lessons Learned
2. LOPSA Board Elections - Vote now!
3. Welcome New Sponsor DataDog
4. Member Tech Blog Highlight
6. LISA Conversations
8. Thank you to our sponsors
9. Comments or suggestions?
June 2016 LOPSAgram: Vote!
1. President's Corner: Lessons Learned
I have been thinking a lot lately about a series of major system outages this year in my datacenter at the school district where I work. They are all related to the switching infrastructure between my ESXi cluster and my SAN. What could I have done to avoid the outages and what lessons can I learn from the outages?
My set up is a cluster of 4 switches of which 2 are redundant (1GB), while the other 2 are not (1 fiber and 1 10GB). When connected together, they become one switching fabric. The fiber switch is connected to storage and the 10GB copper switch to the ESXi servers. In summary, over the past four months I have had four incidents that crashed the datacenter as the servers had no access to their storage:
Incident #1: Cable installers cut power to fiber switch
Incident #2: Switch VLAN reconfiguration causes switches to go offline
Incident #3: Fiber switch completed failed
Incident #4: Adding in replacement fiber switch caused SAN to have problems
Lessons learned from all of these incidents are:
Redundancy is necessary in a modern datacenter. Incidents #1 and #3 could have been avoided if we had redundant fiber switches. Our previous switches from this vendor never had any downtime in over 4 years so we were lulled into a false sense of safety.
Make small changes and test them first. For Incident #2 we thought that the new VLAN configuration would work fine so we pushed it out to all ports on that non-redundant fiber switch. This summer when we try again, we are going to test it on a few of the redundant 1GB ports with a test ESXi server. Once the new configuration is fully tested we will push it out to 1 of the fiber switches and then if it works to the redundant fiber switch.
Make sure you have 7x24 support on all parts of your infrastructure. During Incident #4 we found out that while we had 7x24 support on the ESXi servers and the switches, we only had 5x9 support for the SAN. So we had to wait overnight before we could verify the SAN was the problem (the symptom was iSCSI sessions being dropped).
Just because something has failed in the past doesn't mean it failed again and if something has not failed before there is always a first time. For incident #4 we were sure it was a switch problem again as that is what failed the previous 3 times. It took a hours before we were really looking at the SAN which had never failed before as the possible cause of the failure.
Make sure 7x24 actually means that there is someone who is available 7x24. For the switch vendor, when we called their support no one would pick up. This happened twice to us during Incident #4 and we are now discussing with them what happened and why.
Rebuilding your datacenter while in production is asking for trouble. It is just too easy for the installers, etc. to bump a power cord, patch cable, etc. and cause a problem. I am actually a bit surprised we did not have more issues because of this.
A supportive team is very helpful. Many times as system admins we are responsible for systems that fail due to things out of our control. In these cases, it is really nice to have a team around you that helps out to get services running without pointing fingers, but rather looking for ways to speed the recovery process and then figure out what could have been done to avoid the failure.
That said, spend a few hours and document the risks you see in your datacenter so your management is aware of the risks. This makes it a lot easier when a failure does happen as they will not feel totally blindsided by the failure.
I would love to hear from you about some of your lessons learned. Email me at
firstname.lastname@example.org and I will put the best ones into next months column.
2. LOPSA Board Elections - Vote now!
The Leadership Committee is pleased to announce that they have opened the 2016 LOPSA board member elections.
Before voting, please take a moment to review the first and second LOPSALive sessions, where the candidates answered LOPSA member questions. Finally, review all the candidate statements.
Brian Globerman - Candidate Statement
George Beech (Incumbent) - Candidate Statement
Scott Suehle - Candidate Statement
Steven VanDevender (Incumbent) - Candidate Statement
Thomas Uphill (Incumbent) - Candidate Statement
Trevor Thorpe - Candidate Statement
Vote before the LC closes the polls on June 23 at 12:00AM Eastern.
3. Welcome New Sponsor DataDog
See metrics from all your apps, tools, and services in one place. Begin your trial today, install the agent, and Datadog will send you a free t-shirt!
"Datadog takes care of the complex task of managing a metrics back-end. Instead of figuring out how and where to store data, we get to focus on actually using the data to make better decisions." - Arup Chakrabarti - Head of Operations Engineering, Pagerduty
4. Member Tech Blog Highlight - Vertical Sysadmin
Member Mario Obejas provides a guest post on member Aleksey Tsalolikhin's blog Vertical Sysadmin on “a replacement for bash?” and on writing production-grade code in any language.
Aleksey Tsalolikhin recently presented to educators from the California community college network at the Digital Media Educators Conference (http://ict-dm.net/tracks-2016/data-representation/item/infrastructure-administration-server-management-at-scale) on June 10th on what is a sysadmin and how to make one, with examples of working at scale from the world of CFEngine. Many thanks to Ski Kacoroski for the presentation slides.
Vertical Sysadmin is offering a 15% discount to LOPSA members on professional Git training http://www.verticalsysadmin.com/git/. "I have got by on minimum understanding of git for a couple of years now--this really brought up my confidence. After I learned the git internals, the esoteric commands really started falling into place."
-- Nicholas Santucci, Systems Administrator
Do you have a technical blog you'd like featured in the LOPSAgram? Email: email@example.com
Don't have a blog yet? You can always use your LOPSA blog at https://blogs.lopsa.org. Choose sysadmin-news as the category if it is recent news item for the https://lopsa.org front page.
6. LISA Conversations - Unleashing the Power of the Unikernel
Every month Lee Damon and Tom Limoncelli chat with a LISA conference presenter in LISA Conversations. In June they will be chatting with Russell Pavlicek about Unleashing the Power of the Unikernel.
The conversation will be live at:
on Tuesday, 28 June at 3:30pm PDT, 6:30PM EDT.
Past LISA Conversations can be found at:
SASAG: Seattle Area System Administrators Guild
In June, new officers were elected:
President: Thomas Uphill
Vice-President: Brian Globermann
Secretary: Curtis Elgin
In July we'll hear from Brian Globermann on " Implementing Zabbix in Azure for network and server monitoring"
Dinner will be sponsored by Silicon Mechanics.
LOPSA Columbus met last on May 19 at CoverMyMeds. Rob Kinyon gave a presentation titled "Devs are from Mars, Ops are from Venus" where he empathized with and shared the perspectives of different disciplines of technology professionals. The next meeting will be on June 21 at 6:00PM. RSVP here: https://lopsacbus201606.eventbrite.com
LOPSA-NJ: New Jersey
The June Talk was given by Sujit Pal on IoT.
History and Future of IoT (Commercial and Research)
How is IoT going to impact us
Some Example Applications of IoT
Components of IoT infrastructure
Challanges to be overcome for IoT go mainstream
Some sample solutions
There were 25 people present and Suji's talked sparked discussions on how the use of RFID chips has expanded as the devices shrunk in size. LOPSA-NJ will be on their Summer Break but Joe Youn and Mike Stoppay are the new Organizers for the group and will be working with help from John Boris of the Board of Directors and LOPSA-NJ member to learn the ropes on managing the group. The next meetup will be September 1st at the Mercer County Library Branch in Lawrenceville, NJ. They are actively looking for a speaker for September and October. If interested you can contact John Boris at firstname.lastname@example.org.
LOPSA-LA / UUASC: Los Angeles
Thirty people attended the LOPSA-LA meeting on Thurs June 2, where Mike Weilgart presented on Git Basics.
Mike Weilgart is going to repeat his popular "Taming the Git Filesystem" talk at LOPSA LA meeting in Burbank CA at Coding Dojo on Thursday, June 23rd, 7-9 PM. RSVP http://www.meetup.com/lopsala/events/23171720
Keep with the LOPSA-LA community and upcoming meetings via the email list and website: http://www.lopsala.org/
8. Thank You To Our Sponsors
We'd like to thank our sponsors. We're deeply grateful for their continuing support of LOPSA. More information on how to become a sponsor.
Thanks to our individual sponsors:
Platinum: Jennine Townsend, Dan Rich
Gold: Ski Kacoroski
Silver: Matt Disney, Lee Damon, Scott Murphy, Ian Viemeister
Bronze: Gary Studwell
Gold Sponsor Paessler AG
Bronze Sponsor Edgestream Partners is a small group of scientists and engineers with a unique approach to trading in the financial markets. Our company designs, builds and runs a global trading software platform. We take pride in our software craftsmanship and use Python, Cython and C on Linux to run our global trading operations. We also use open-source tools as much as possible - Python, PostgreSQL, numpy, git, Cobbler, Puppet and Ansible are all crucial to our business.
Bronze Sponsor O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption by amplifying "faint signals" from the alpha geeks who are creating the future. An active participant in the technology community, the company has a long history of advocacy, meme-making, and evangelism. Check them out.
Some of LOPSA's web content is hosted by ServerBeach.
9. Comments or suggestions?
As we close out this month's LOPSAgram, we want to make sure we're giving you the information you want or need. If you have any comments or suggestions, please feel free to send them to email@example.com
Office: +1 (202) 567-7201, Fax: 609-219-6787, Address: PO Box 5161, Trenton, NJ 08638-0161
Facebook • Twitter • LinkedIn • G+ • Reddit