>> Monday, October 28, 2013
Over the last few months, I migrated the test infrastructure that we run for Android 4.0 on Panda devices to use mozharness and mozpool. Before I go into a description of what this entailed, some background for those not familiar with our release engineering infrastructure. We use Buildbot for our continuous integration engine. We have 50+ Buildbot masters are designated specific purposes such as scheduling, building, try servers and testing. Test masters are further subdivided into ones specifically allocated to running tests on Mac, Linux, Windows, Tegras (Android 2.2 devices) and Pandas (Android 4.0 devices). We have many devices that are allocated to each master to handle the volume of tests and builds we run each day.
We have over 800 panda devices in production that are used to run unit tests and talos (performance) tests for Firefox for Android 4.0. Panda devices are rack mounted in a chassis. One of the issues with these devices is that they are development boards and inherently quite unstable when running a large volume of tests 24/7. Not many organizations run the volume of automated tests on mobile devices that we do. Dealing with their issues in a non-automated fashion does not scale.
|Pandas fall down|
Mozpool is software that is used to mitigate that inherent stability and ensure that the devices that are available to run tests are in a verified state and have the correct Android image. If Mozpool determines that the device has problems that makes it unsuitable to run tests, it changes its state so it won't be in the pool of devices running tests. For instance, if it doesn't boot correctly or the requisite image cannot be applied. This reduces the number of jobs that fail due to infrastructure issues. Mozpool also provides logging of the actions on devices and and a web page for looking at the state of your devices. There is an API implemented vi REST and HTTP for simple actions on your devices, such as requesting a device, and returning it to the pool. Another advantage of Mozpool is that it's easy to reimage the devices with a new image, either via the mozpool ui or simply specifying a new image in your mozpool client code.
There are four main projects that are used to manage our infrastructure buildbotcustom, buildbot-configs, tools and mozharness. From the mozharness FAQ in Aki's words "
Traditionally, the scripts that define the buildbot actions for our tests have been defined in the buildbotcustom project. The code is very convoluted and it is difficult to new people to parse what bits of code apply to each platform. Also, every time we want to deploy changes for them we have to do a reconfig. A reconfig is an operation where we upgrade our buildbot masters to the latest version of the code in the buildbotcustom, buildbot-configs and tools repositories. A reconfig is usually initiated by the releng person on buildduty only a few times a week. When you use mozharness, the buildbot config scripts point to the production branch of the mozharness repo. New changes can be deployed to production with a simpler merge to the production branch of mozharness. No reconfig.
|Closeup of zip line harness aka visual representation of some buildbotcustom code|
With mozharness scripts, you have a discrete script and config file that applies to each platform. For instance, for panda android unittests, the config file is here, and the corresponding script is here.
There are actions in mozharness that serve as boilerplate code that can be reused to common actions when running tests, such as installing Python packages into a virtual environment, closing repositories, downloading and extracting files and so on. For instance, here are the default actions when running the android unit tests on Pandas:
If you are have questions regarding mozharness, please join the mozharness channel on irc.mozilla.org :-)
Dustin Mitchell's description of Mozpool
Mozpool API documentation
Aki Sasaki's Mozharness FAQ
Aki Sasaki's first blog post on why mozharness and more