Submissions for Releng 2016: due by July 1, 2016

>> Friday, June 03, 2016

The CFP for Releng 2016 is open!  The workshop will be held November 18, 2016 in Seattle.  It will be held in conjunction with FSE 2016.  (Foundations of Software Engineering ACM conference)

Picture by howardignatius- Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0)
https://www.flickr.com/photos/howardignatius/14482954049/sizes/l
If you've done something like
  • Migrated to a new build or continuous integration system
  • Implemented a new release or deployment pipeline
  • Implemented tooling to simplify managing your apps in a mobile store
  • Significantly reduced build time with parallelization or some other interesting optimization!
  • Moved your build and test system to containers
  • Refactored your infrastructure code for a live production environment
  • ... we'd love to see your submission to the workshop

We'd like to encourage people new to speaking to apply, as well as those from underrepresented groups in tech. We'd love to hear from some new voices and new companies ! 


Submissions are due July 1, 2016. If you have questions on of the submission process, topics to submit, or anything else, I'm happy to help!  I'm kmoir and I work at mozilla.com or contact me on twitter. Submit early and often!

Read more...

DevOpsDays Toronto recap

>> Thursday, June 02, 2016

Last week I attended DevOpsDays Toronto.  It was my first time attending a DevOpsDays event and it was quite interesting.  It was held at CBC's Glenn Gould studios which is a quick walk from the Toronto Island airport where I landed after an hour flight from Ottawa.  This blog post is an overview of some of the talks at the conference.
 

Glenn Gould Studios, CBC, Toronto.  

Statue of Glenn Gould outside the CBC studios that bear his name.

Day 1


The day started out with an introduction from the organizers and a brief overview of history of DevOps days. They also made a point about reminding everyone that they had agreed to the code of conduct when they bought their ticket. I found this explicit mention of the code of conduct quite refreshing.


The first talk of the day was John Willis,  evangelist at Docker.  He gave an overview of the state of enterprise devops. I found this a fresh perspective because I really don't know what happens in enterprises with respect to DevOps since I have been working in open source communities for so long.  John providing an overview of what DevOps encompasses.


DevOps is a continuous feedback loop.


He talked a lot about how empathy is so important in our jobs.  He mentions that at Netflix has a slide deck that describes company culture.  He doesn't know if this is still the case, but it he had heard that if you hadn't read the company culture deck and show up for an interview at Netflix, you would be automatically disqualified for further interviews.  Etsy and Spotify have similar open documents describing their culture.

Here he discusses the research by Christina Maslach on the six sources of burnout.
Christina Maslach
Christina Maslach

He gave us some reading to do.  I've read the "Release It!" book which is excellent and has some fascinating stories of software failure in it, I've added the other books to my already long reading list.

The rugged manifesto and realizing that the code you write will always be under attack by malicious authors.  ICE stands for Inclusivity, Complexity and Empathy.

He stated that it's a long standing mantra that you can have two of either fast, cheap or good but recent research shows that today we can many changes quickly, and if there is a failure the mean time to recovery is short.

He left us with some more books to read.

The second talk was a really interesting talk by Hany Fahim, CEO of VM Farms.  It was a short mystery novella describing how VM Farms servers suddenly experienced a huge traffic spike when the Brazilian government banned Whatsapp  as a result of a legal order. I love a good war story.

 Hany discussed one day VMfarms suddenly saw a huge increase in traffic. 

This was a really important point.  When your system is failing to scale, it's important to decide if it's a valid increase in traffic or malicious.


Looking on twitter, they found that a court case in Brazil had recently ruled that Whatsup would be blocked for 48 hours.  Users started circumventing this block via VPN.  Looking at their logs, they determined that most of the traffic was resolving to ip addresses from Brazil and  that there was a large connection time during SSL handshakes.
  

The government of Brazil encouraged the use of open source software versus Windows, and thus the users became more technically literate, and able to circumvent blocks via VPN.


In conclusion, making changes to use multi-core HAProxy fixed a lot of issues. Also, twitter was and continues to be a great source of information on activity that is happening in other countries. Whatsapp was returned to service and then banned a second time, and their servers were able to keep up with the demand.

After lunch, we were back to to more talks.  The organizers came on stage for a while to discuss the afternoon's agenda.  They also remarked that one individual had violated the code of conduct and had been removed from the conference.  So, the conference had a code of conduct and steps were taken if it was violated.

Next up, Bridget Kromhout from Pivotal gave a talk entitled Containers will not Fix your Broken Culture.
I first saw Bridget speak at Beyond the Code in Ottawa in 2014 about scaling the streaming services for Drama Fever on AWS.  At the time, I was moving our mobile test infrastructure to AWS so I was quite enthralled with her talk because 1) it was excellent 2) I had never seen another woman give a talk about scaling services on AWS.  Representation matters.

The summary of the talk last week was that no matter what tools you adopt, you need to communicate with each other about the cultural changes are required to implement new services.  A new microservices architecture is great, but if these teams that are implementing these services are not talking to each other, the implementation will not succeed.

Bridget pointing out that the technology we choose to implement is often about what is fashionable.


Shoutout to Jennifer Davis' and Katherine Daniel's Effective DevOps book. (note -  I've read it on Safari online and it is excellent.  The chapter on hiring is especially good)

Loved this poster about the wall of confusion between development and operations.  

In the afternoon, there were were lightning talks and then open spaces. Open spaces are free flowing discussions where the topic is voted upon ahead of time.  I attended ones on infrastructure automation, CI/CD at scale and my personal favourite, horror stories.  I do love hearing how distributed system can go down and how to recover.  I found that the conversations were useful but it seemed like some of them were dominated by a few voices.  I think it would be better if the person that suggested to topic for the open space also volunteered to moderate the discussion.

Day 2

The second day started out with a fantastic talk by John Arthorne of Shopify speaking on scaling their deployment pipeline.  As a side note, John and I worked together for more than a decade on Eclipse while we both worked at IBM so it was great to catch up with him after the talk. 



He started by giving some key platform characteristics.  Stores on Shopify have flash sales that have traffic spikes so they need to be able to scale for these bursts of traffic. 

From commit to deploy in 10 minutes.  Everyone can deploy. This has two purposes: Make sure the developer stays involved in the deploy process.  If it only takes 10 minutes, they can watch to make sure that their deploy succeeds. If it takes longer, they might move on to another task.  Another advantage of this quick deploy process is that it can delight customers with the speed of deployment.  They also deploy in small batches to ensure that the mean time to recover is small if the change needs to be rolled back.
 
BuildKite is a third party build and test orchestration service.  They wrote a tool called Scrooge that monitors the number of EC2 nodes based on current demand to reduce their AWS bills.  (Similar to what Mozilla releng does with cloud-tools)


Shopify uses a open source orchestration tool called ShipIt.  I was sitting next to my colleague Armen at the conference and he started chuckling at this point because at Mozilla we also wrote an application called ship-it which release management uses to kick off Firefox releases.   Shopify also has a overall view of the ship it deployment process which allows developers to see the percentages of nodes where their change has been deployed. One of the questions after the talk was why they use AWS for their deployment pipeline when they have use machines in data centres for their actual customers. Answer: They use AWS where resilency is not an issue. 
 
Building containers is computationally expensive. He noted that a lot of engineering resources went into optimizing the layers in the Docker containers. To isolate changes to the smallest layer.  They build service called Locutus to build the containers on commit, and push to a registry. It employs caching to make the builds smaller. 

One key point that John also mentioned is that they had a team dedicated to optimizing their deployment pipeline.  It is unreasonable to expect that developers working on the core Shopify platform to also optimize the pipeline.

In the afternoon , there were a series of lightning talks. Roderick Randolph from Capital One gave an amazing talk about Supporting Developers through DevOps.


It was an interesting perspective.  I've seen quite a few talks about bringing devops culture and practices to the operations side of the house, but the perspective of teaching developers about it is discussed less often.


'

He emphasized the need to empower developers to use DevOp practices by giving them tools, and showing them how to use them.  For instance, if they needed to run docker to test something, walk them through it so they will know how to do it next time. 





The final talk I'll mention is by Will Weaver.  He talks about how it is hard to show prospective clients how he had CI and tests experience when that experience is not open to the public.  So he implemented tests and CI for his dotfiles on github. 


He had excellent advice on how to work on projects outside of work to showcase skills for future employers.




Diversity and Inclusion


As an aside, whenever I'm at a conference I note the number of people in the "not a white guy" group. This conference had an all men organizing committee but not all white men.  (I recognize the fact that not all diversity is visible i.e. mental health, gender identity, sexual orientation, immigration status etc) They was only one woman speaker, but there were a few non-white speakers.  There were very few women attendees. I'm not sure what the process was to reach out to potential speakers other than the CFP. 



 There were slides that showed diverse developers which was refreshing.



Loved Roderick's ops vs dev slide.

I learned a lot at the conference and am thankful for all the time that the speakers took to prepare their talks.  I enjoyed all the conversations I had learning about the challenges people face in the organizations implementing continuous integration and deployment. It also made me appreciate the culture of relentless automation, continuous integration and deployment that we have at Mozilla.

I don't know who said this during the conference but I really liked it

Shipping is the heartbeat of your company

It was interesting to learn how all these people are making their companies heart beat stronger via DevOps practices and tools.

Read more...

Welcome Mozilla Releng summer interns

>> Friday, May 13, 2016

We're delighted to have Francis Kang and Connor Sheehan join the Mozilla release engineering team as summer interns.  Francis is studying at the University of Toronto while Connor attends McMaster University in Hamilton, Ontario.  We'll have another intern (Anthony) join us later on in the summer who will be working from our San Francisco office.

Francis and Connor will be working on implementing some new features in release promotion as well as  migrating some builds to taskcluster.  I'll be mentoring Francis,  while Rail will be mentoring Connor.  If you are in the Toronto office, please drop by to say hi to them.  Or welcome them on irc as fkang or sheehan. 

Kim, Francis, Connor and Rail
They are both already off to a great start and have pull requests merged into production that fixed some release promotion issues.  Their code was used in the Firefox 47.0 beta 5 release promotion that we ran last night so their first week was quite productive.


Mentoring an intern provides an opportunity to see the systems we run from a fresh perspective.  They both have lots of great questions which makes us revisit why design decisions were made, could we do things better?   Like all teaching roles, I always find that I learn a tremendous amount from the experience, and hope they have fun learning real world software engineering concepts with respect to running large distributed systems.

Welcome to Mozilla!

Read more...

RelEng & RelOps Weekly highlights - March 4, 2016

>> Monday, March 07, 2016

It was a busy week with many releases in flight, as well as preparation for running beta 1 with release promotion next week.  We also are in the process of adding more capacity to certain test platform pools to lower wait times given all the new e10s tests that have been enabled.

Improve Release Pipeline:

  • Nick ran a staging release for 46.0b1 to check for issues before the merge, preventing some bustage for Fennec and ensuring we can fall back to the old system if any unexpected issues show up with release promotion
  • Varun improved Balrog’s detection of certain types of bad data.
  • Ben finished most of the work involved with preparing Balrog to move to CloudOps infrastructure, including automatically building Docker images.
  • We’re hoping to do our first beta with the new release build promotion pipeline next week for 47.0b1. Stay tuned!
Everyone gets a release promotion!  Source: http://i.imgur.com/WMmqSDI.jpg

Improve CI Pipeline:
  • Dustin deployed a new version of the TaskCluster tools/login system with much improved UI for handling signing in and out and editing clients and roles.  He also simplified the existing roles, with the result that the set of roles now fits on one screen, and is entirely composed of human-readable names.  All of this works toward two important goals: building a sign-in system that is useful and usable by all mozillians; and configuring the access-control system to give everyone their appropriate permissions and no more.

Release:

The releases calendar is getting busier as we get closer to the end of the cycle. Many releases were shipped or are still in-flight:
  • Firefox 45.0b10
  • Fennec 45.0b11
  • Fennec 45.0 (in-progress)
  • Firefox 45.0 (in-progress) - we shipped the RC to the beta channel
  • Firefox 45.0esr (in-progress)
  • Firefox 38.7.0esr (in-progress)
As always, you can find more specific release details in our post-mortem minutes:
https://wiki.mozilla.org/Releases:Release_Post_Mortem:2016-03-02 
https://wiki.mozilla.org/Releases:Release_Post_Mortem:2016-03-09

Operational:
  • Vlad, Alin and Amy reallocated 28 Windows test machines to the w7 pool to help with backlog and e10s testing.
  • Jake deployed new OpenSSL packages to protect our infrastructure from Drown Attack and various other recent OpenSSL vulnerabilities

Until next time!

Read more...

RelEng & RelOps Weekly highlights - February 26, 2016

>> Monday, February 29, 2016

It was a busy week for release engineering as several team members travelled to the Vancouver office to sprint on the release promotion project. The goal of the release promotion project is to promote continuous integration builds to release channels, allowing us to ship releases much more quickly.



Improve Release Pipeline:
  • Chris, Jordan, Callek (remotely), Kim, Mihai and Rail had a sprint on Release Promotion. We made so much progress on this project that we decided to use the new process for Firefox 46.0b1. https://bugzil.la/1118794 So many green jobs!


Improve CI Pipeline:

Release:
  • Ben, Mihai, Nick, Rail and Callek Shipped Firefox 45.0b6, Firefox 45.0b7, Firefox 45.0b9, Fennec 45.0b6, Thunderbird 45.0b2 

Operational:
  • Alin landed changes to run mochitest-push-e10s tests on Windows 7 https://bugzil.la/1248729. This is another step toward completing the enabling of e10s tests. 

Read more...

Tips from a resume nerd

>> Friday, January 08, 2016

Before I begin this post a few caveats:

  • I don't work in HR
  • I'm not a manager (but I interview Mozilla releng candidates)
  • I'm not looking for a new job.
  • These are just my observations after working in the tech industry for a long time.
I'm kind of a resume and interview nerd.  I like helping friends fix their resumes and write amazing cover letters. In the past year I've helped a few (non-Mozilla) friends fix up their resumes, write cover letters, prepare for interviews as they search for new jobs.  This post will discuss some things I've found to be helpful in this process.

Picture by GotCredit - Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0)
https://www.flickr.com/photos/jakerust/16223669794/sizes/l

Preparation
Everyone tends to jump into looking at job descriptions and making their resume look pretty. Another scenario is that people have a sudden realization that they need to get out of their current position and find a new job NOW and frantically start applying for anything that matches their qualifications.  Before you do that, take a step back and make a list of things that are important to you.  For example, when I applied at Mozilla, my list was something like this

  • learn release engineering at scale + associated tools/languages
  • open source
  • no relocation
  • work on a team of release engineers (not be the only one)
  • good team dynamics - people happy to share knowledge and like to ship
  • work in an organization where release engineering is valued for increasing the productivity of the organization as a whole and is funded (hardware/software/services/training) accordingly
  • support to attend and present at conferences
People spend a lot of time at work. Life is too short to be unhappy every day.  Writing a list of what is important serves as a checklist to when you are looking at job descriptions and immediately weed out the ones that don't match your list.  

Picture by Mufidah Kassalias - Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0)
https://www.flickr.com/photos/mufidahkassalias/10519774073/sizes/o/
 
People tend focus a lot on the technical skills they want to use or new ones you want to learn.  You should also think about what kind of culture where you want to work.  Do the goals and ethics of the organization align with your own? Who will you be working with? Will you enjoy working with this team?  Are you interested in remote work or do you want to work in an office? How will a long commute impact or relocation your quality of life? What is the typical career progression of someone in this role? Are there both management and technical tracks for advancement?


Picture by mugley - Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) https://www.flickr.com/photos/mugley/4221455156/sizes/o/


To summarize, itemize the skills you'd like to use or learn, the culture of the company and the team and why you want to work there.

Cover letter

Your cover letter should succinctly map your existing skills to the role you are applying for and convey enthusiasm and interest.  You don't need to have a long story about how you worked on a project at your current job that has no relevance to your potential new employer.  Teams that are looking to hire have problems to solve.  Your cover letter needs to paint a picture that your have the skills to solve them.

Picture by Jim Bauer - Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) https://www.flickr.com/photos/lens-cap/10320891856/sizes/l


Refactoring your resume

Developers have a lot of opportunities these days, but if you intend to move from another industry, into a tech company, it can be more tricky.  The important thing is to convey the skills you have in a a way that people can see they can be applied to the problems they want to hire you to fix. 

Many people describe their skills and accomplishments in a way that is too company specific.  They may have a list of acronyms and product names on their resume that are unlikely to be known by people outside the company.  When describing the work you did in a particular role, describe the work that you did in a that is measurable way that highlights the skills you have.  An excellent example of a resume that describes the skills that without going into company specific detail is here. (Julie Pagano also has a terrific post about how she approached her new job search.)

Another tip is to leave out general skills that are very common.  For instance, if you are a technical writer, omit the fact that you know how to use Windows and Word and focus on highlighting your skills and accomplishments. 


Non-technical interview preparation

Every job has different technical requirements and there are many books and blog posts on how to prepare for this aspect of the interview process. So I'm going to just cover the non-technical aspects.

When I interview someone, I like to hear lots of questions.  Questions about the work we do and upcoming projects.  This indicates that have taken the time to research the team, company and work that we do.  It also shows that enthusiasm and interest.

Here is a list suggestions to prepare for interviews

1.  Research the company make a list of relevant questions
Not every company is open about the work that they do, but most will be have some public information that you can use to formulate questions during the interviews.  Do you know anyone you can have coffee or skype with to who works for the company and can provide insight? What products/services do the company produce? Is the product nearing end of life?  If so, what will it be replaced by? What is the companies market share, is it declining, stable or experiencing growth? Who are their main competitors? What are some of the challenges they face going forward? How will this team help address these challenges?

2.  Prepare a list of questions for every person that interviews you ahead of time
Many companies will give you the list of names of people who will interview you.
Have they recently given talks? Watch the videos online or read the slides.
Does the team have github or other open repositories?  What are recent projects are they working on? Do they have a blog or are active on twitter? If so, read it and formulate some questions to bring to the interview.
Do they use open bug tracking tools?  If so, look at the bugs that have recent activity and add them to the list of questions for your interview. 
A friend of mine read the book of a person that interviewed him had written and asked questions about the book in the interview.  That's serious interview preparation!

Photo by https://www.flickr.com/photos/wocintechchat/ https://www.flickr.com/photos/wocintechchat/22506109386/sizes/l


3. Team dynamics and tools
Is the team growing or are you hiring to replace somebody who left?
What's the onboarding process like? Will you have a mentor?
How is this group viewed by the rest of the company? You want to be in a role where you can make a valuable contribution.  Joining a team where their role is not valued by the company or not funded adequately is a recipe for disappointment.
What does a typical day look like?  What hours do people usually work?
What tools do people use? Are there prescribed tools or are you free to use what you'd like?

4.  Diversity and Inclusion
If you're a member of an underrepresented group in tech, the numbers are lousy in this industry with some notable exceptions. And I say that while recognizing that I'm personally in the group that is the lowest common denominator for diversity in tech. 

The entire thread on this tweet is excellent  https://twitter.com/radiomorillo/status/589158122108932096


I don't really have good advice for this area other than do your research to ensure you're not entering a toxic environment.  If you look around the office where you're being interviewed and nobody looks like you, it's time for further investigation.   Look at the company's website - is the management team page white guys all the way down?  Does the company support diverse conferences, scholarships or internships? Ask on a mailing list like devchix if others have experience working at this company and what it's like for underrepresented groups. If you ask in the interview why there aren't more diverse people in the office and they say something like "well, we only hire on merit" this is a giant red flag. If the answer is along the lines of "yes, we realize this and these are the steps we are taking to rectify this situation",  this is a more encouraging response.

A final piece of advice, ensure that you meet with your manager that you're going to report to as part of your hiring process.  You want to ensure that you have rapport with them and can envision a productive working relationship. 

What advice do you have for people preparing to find a new job?

Further reading

Katherine Daniels gave at really great talk at Beyond the Code 2014 about how to effectively start a new job.  Press start: Beginning a New Adventure Job
She is also the co-author of Effective Devops which has fantastic chapter on hiring.
Erica Joy writes amazing articles about the tech industry and diversity.
Cate Huston has some beautiful posts on how to conduct technical interviews and how to be a better interviewer
Camille Fournier's blog is excellent reading on career progression and engineering management.
Mozilla is hiring!

Read more...

USENIX Release Engineering Summit 2015 recap

>> Tuesday, November 24, 2015

November 13th, I attended the USENIX Release Engineering Summit in Washington, DC.  This summit was along side the larger LISA conference at the same venue. Thanks to Dinah McNutt, Gareth Bowles, Chris Cooper,  Dan Tehranian and John O'Duinn for organizing.



I gave two talks at the summit.  One was a long talk on how we have scaled our Android testing infrastructure on AWS, as well as a look back at how it evolved over the years.

Picture by Tim Norris - Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0)
https://www.flickr.com/photos/tim_norris/2600844073/sizes/o/

Scaling mobile testing on AWS: Emulators all the way down from Kim Moir

I gave a second lightning talk in the afternoon on the problems we face with our large distributed continuous integration, build and release pipeline, and how we are working to address the issues. The theme of this talk was that managing a large distributed system is like being the caretaker for the water, or some days, the sewer system for a city.  We are constantly looking system leaks and implementing system monitoring. And probably will have to replace it with something new while keeping the existing one running.

Picture by Korona Lacasse - Creative Commons 2.0 Attribution 2.0 Generic https://www.flickr.com/photos/korona4reel/14107877324/sizes/l



In preparation for this talk, I did a lot of reading on complex systems design and designing for recovery from failure in distributed systems.  In particular, I read Donatella Meadows' book Thinking in Systems. (Cate Huston reviewed the book here). I also watched several talks by people who talked about the challenges they face managing their distributed systems including the following:
I'd also like to thank all the members of Mozilla releng/ateam who reviewed my slides and provided feedback before I gave the presentations.
The attendees of the summit attended the same keynote as the LISA attendees.  Jez Humble, well known for his Continuous Delivery and Lean Enterprise books provided a keynote on Lean Configuration Management which I really enjoyed. (Older version of slides from another conference, are available here and here.)



In particular, I enjoyed his discussion of the cultural aspects of devops. I especially like that he stated that "You should not have to have planned downtime or people working outside business hours to release".  He also talked a bit about how many of the leaders that are looked up to as visionaries in the tech industry are known for not treating people very well and this is not a good example to set for others who believe this to be the key to their success.  For instance, he said something like "what more could Steve Jobs have accomplished had he treated his employees less harshly".

Another concept he discussed which I found interesting was that of the strangler application. When moving from a large monolithic application, the goal is to split out the existing functionality into services until the originally application is left with nothing.  Exactly what Mozilla releng is doing as we migrate from Buildbot to taskcluster.


http://www.slideshare.net/jezhumble/architecting-for-continuous-delivery-54192503


At the release engineering summit itself,   Lukas Blakk from Pinterest gave a fantastic talk Stop Releasing off Your Laptop—Implementing a Mobile App Release Management Process from Scratch in a Startup or Small Company.  This included grumpy cat picture to depict how Lukas thought the rest of the company felt when that a more structured release process was implemented.


Lukas also included a timeline of the tasks that implemented in her first six months working at Pinterest. Very impressive to see the transition!


Another talk I enjoyed was Chaos Patterns - Architecting for Failure in Distributed Systems by Jos Boumans of Krux. (Similar slides from an earlier conference here). He talked about some high profile distributed systems that failed and how chaos engineering can help illuminate these issues before they hit you in production.


For instance, it is impossible for Netflix to model their entire system outside of production given that they consume around one third of nightly downstream bandwidth consumption in the US. 

Evan Willey and Dave Liebreich from Pivotal Cloud Foundry gave a talk entitled "Pivotal Cloud Foundry Release Engineering: Moving Integration Upstream Where It Belongs". I found this talk interesting because they talked about how the built Concourse, a CI system that is more scaleable and natively builds pipelines.   Travis and Jenkins are good for small projects but they simply don't scale for large numbers of commits, platforms to test or complicated pipelines. We followed a similar path that led us to develop Taskcluster

There were many more great talks, hopefully more slides will be up soon!

Read more...

  © Blogger template Simple n' Sweet by Ourblogtemplates.com 2009

Back to TOP