Tag QA

The ARCS model of motivational design

Motivation

The ARCS model is an instructional design method developed by John Keller that focuses on motivation. ARCS is based on a research into best practices and successful teachers and gives you tactics on how to evaluate your lessons in order to build motivation right into them.

I have conducted and oversaw quite a few trainings and I have not been impressed with the success rate of those so this topic is very dear to me. Success for me measures in the ability to complete the training and learn the basis of a technical topic. And then gather the initial momentum to continue developing your skills within the chosen field. This is what I've been doing for myself and this is what I'd like to see my students do.

In his paper (I have a year 2000 printed copy from Cuba) Keller argues that motivation is a product of four factors: Attention, Relevance, Confidence and Satisfaction. You need all of them incorporated in your lessons and learning materials for them to be motivational. I could argue that you need the same characteristics at work in order to motivate people to do their job as you wish.

Once you start a lesson you need to grab the audience Attention so they can listen to you. Then the topic needs to be relevant to the audience so they will continue listening to the end. This makes for a good start but is not enough. Confidence means for the audience to feel confident they can perform all the necessary tasks on their own, that they have what it takes to learn (and you have to build that). If they think they can't make it from the start then it is a lost battle. And Satisfaction means the person feels that achievements are due to their own abilities and hard work not due to external factors (work not demanding enough, luck, etc).

If all of the above 4 factors are true then the audience should feel personally motivated to learn because they can clearly understand the benefit for themselves and they realize that everything depends on them.

ARCS gives you a model to evaluate your target audience and lesson properties and figure out tactics by which to address any shortcomings in the above 4 areas.

Last Friday I hosted 2 training sessions: a Python and Selenium workshop at HackConf and then a lecture about test case management and demo of Kiwi TCMS before students at Pragmatic IT academy. For both of them I used the simplified ARCS evaluation matrix.

In this matrix the columns map to the ARCS areas while the rows map to different parts of the lesson: audience, presentation media, exercise, etc. Here's how I used them (I've mostly analyzed the audience).

Python & Selenium workshop

  • Attention
    • (+) this is an elective workshop
    • (+) the topic is clear and the curricula is on GitHub
    • (+) the title is catchy (Learn Python & Selenium in 6 hours)
    • (+) I am well known in the industry
  • Relevance
    • (+) Basic Python practical skills, being able to write small programs, knowing the basic building blocks
    • (+) Basic Selenium skills: finding and using elements
    • (+) Basic Python test automation skills: writing simple tests and asserts
  • Confidence
    • (+) each task has tests which need to report PASS at the end
    • (-) need to use PyCharm IDE, unfamiliar with IDEs
    • (-) not enough experience with programming or Linux
    • (-) not enough experience with (automation) testing
    • (-) all materials and exercises are in English
  • Satisfaction
    • (-) not being able to create a simple program

From the above it was clear that I didn't need to spend much time on building attention or relevance. The topic itself and the fact that these are skill which can be immediately applied at work gave the workshop a huge boost. During the opening part of my workshop I've stated "this training takes around 2 months, I've seen some of you forking my GitHub repo so I know you are prepared. Let's see how much you can do in 6 hours" which sets the challenge and was my attention building moment. Then I reiterated that all skills are directly applicable in daily work confirming the relevance part.

I did need a confidence building strategy though. So having all the tests ready meant evaluation was quick and easy. Anton (my assistant) and I promised to help with the IDE and all other questions to counter the other items on the list. During the course of the workshop I did quick code review of all participants that managed to complete their tasks within the hour giving them quick tips on how to perform or highlighting pieces of code/approaches that were different from mine or that I found elegant or interesting. This was my confidence building strategy. Code review and verbal praising also touches on the satisfaction area, i.e. the participant gets the feeling they are doing well.

My Satisfaction building strategy was kind of mixed. Before I read about ARCS I wanted to give penalty points to participants who didn't complete on time and then send them home after 3 fails. At the end I only said I will do this but didn't do it.

Instead I used the challenge statement from the attention phase and turned that into a competition. The first 3 participants to complete their module tasks on time were rewarded chocolates. With the agreement of the entire group the grand prize was set to be a small box of the same chocolates and this would be awarded to the person with the most chocolates (e.g. the one who's been in top 3 the most times).

I don't know if ARCS had anything to do with it but this workshop was the most successful training I've ever done. 40% of the participants managed to get at least one chocolate and at least 50% have completed all of their tasks within the hour. Normally a passing rate on such training is around 10 to 20 %.

During the workshop we had 5 different modules which consisted of 10-15 minutes explanation of Python basics (e.g. loops or if conditions), quick Q&A session and around 30 minutes for working alone and code review. I don't think I was following ARCS for each of the separate modules because I didn't have time to analyze them individually. I gambled all my money on the introductory 10 minutes!

TCMS lecture

My second lecture for the day was about test case management. The audience was students who are aspiring to become software testers and attending the Software Testing training at Pragmatic. In my lecture (around 1 hour) I wanted to explain what test management is, why it is important and also demo the tool I'm working on - Kiwi TCMS. The analysis looks like:

  • Attention
    • (+) the entire training was elective but
    • (-) that particular lecture was mandatory. Students were not able to select what they are going to study
  • Relevance
    • (-) it may not be clear what TCMS is and why we need it
    • (+) however students may sense that this is something work related since the entire training is
  • Confidence
    • (-) unknown UI, generally unfamiliar workflow
    • (-) not enough knowledge how to write a Test Plan document or test cases
  • Satisfaction
    • (-) how to make sure new skills can be applied in practice

So I was in a medium need of a strategy to build attention. My opening was by introducing myself to establish my professional level and introducing Kiwi TCMS by saying it is the best open source test case management system to which I'm one of the core maintainers.

Then I had a medium need of a relevance building strategy. I did this by explaining what test management is and why it is important. I've talked briefly about QA managers trying to indirectly inspire the audience to aim for this position. I finished this part by telling the students how a TCMS system helps the ordinary guy in their daily work - namely by giving you a dashboard where you can monitor all the work you need to do, check your progress, etc.

I was in a strong need to build confidence. I did a 20-30 minutes demonstration where I was writing a Test Plan and test cases and then pretending to execute them and marking bugs and test results in the system. I told the students "you are my boss for today, tell me what I need to test". So they instructed me to test the login functionality of the system and we agreed on 5 different test cases. I described all of these into Kiwi TCMS and began executing them. During execution I opened another browser window and did exactly what the test case steps were asking for. There were some bugs so I promptly marked them as such and I promised I will fix them.

To build satisfaction I was planning on having the students write one test plan and some test cases but we didn't have time for this. Their instructor promised they will be doing more exercises and using Kiwi TCMS in the next 2 months but this remains to be seen. I've wrapped my lecture by giving advise to use Kiwi TCMS as a portfolio building tool. Since these students are newcomers to the QA industry their next priority will be looking for a job. I've advised them to document their test plans and test cases into Kiwi TCMS and then present these artifacts to future employers. I've also told them they are more than welcome to test and report bugs against Kiwi TCMS on GitHub and add these bugs to their portfolio!

This is how I've applied ARCS for the first time. I like it and will continue to use it for my trainings and workshops. I will try harder to make the application process more iterative and apply the method not only to my opening speech but for all submodules as well!

One thing that bothers me is can I apply the ARCS principles when doing a technical presentation and how do they play together or clash with storytelling, communication style and rhetoric (all topics I'm exploring for my public speaking). If you do have more experience with these please share it in the comments below.

Thanks for reading and happy testing!

There are comments.

Storytelling for test professionals

This is a very condensed brief of an 8 hour workshop I visited earlier this year held by Huib Schoots. You can find the slides here.

Storytelling is the form in which people naturally communicate. Understanding the building blocks of a story will help us understand other people's motivations, serve as map for actions and emotions, help uncover unknown perspectives and serve as source for inspiration.

Stories stand on their own and have a beginning, middle and an end. There is a main character and a storyline with development. Stories are authentic and personal and often provocative and evoke emotions.

7 basic story plots

  1. Overcoming the Monster
  2. Rags to Riches
  3. The Quest
  4. Voyage and return
  5. Comedy
  6. Tragedy
  7. Rebirth

From these we can derive the following types of stories.

6 types of stories

  1. Who am I (identity stories)
  2. Why am I here (motive and mission stories)
  3. Vision stories (the big picture)
  4. Future scenarios (imagining the future)
  5. Product stories (branding)
  6. Culture stories (a sum of other stories)

12 Common Archetypes

Each story needs a hero and there are 12 common archetypes of heroes. More importantly you can also find these archetypes within your team and organization. Read the link above to find out what their motto, core desire, goals, fears and motives are. The 12 types are

  1. Innocent
  2. Everyman
  3. Hero
  4. Caregiver
  5. Explorer
  6. Rebel
  7. Lover
  8. Creator
  9. Jester
  10. Sage
  11. Magician
  12. Ruler

6 key elements of a story

  1. Who's the hero?
  2. What is their desire?
  3. What is stopping them?
  4. What is the turning point?
  5. What are their insights?
  6. What is the solution?

Dramatic structure and Freytag's pyramid

"Freytag's pyramid"

One of the most commonly used storytelling structures is the Freytag's Pyramid. According to it each story has an exposition, rising action, climax, falling action and resolution. I think this can be applied directly when preparing presentations even technical ones.

The Hero's journey

Successful stories follow the 12 steps of the hero's journey

  1. Ordinary world
  2. Call to adventure
  3. Refusal of the Call
  4. Meeting the mentor
  5. Crossing the threshold (after which the hero enters the Special world)
  6. Tests, allies and enemies
  7. Approach
  8. Ordeal, death & rebirth
  9. Rewards, seizing the sword
  10. The road back (to the ordinary world)
  11. Resurrection
  12. Return with elixir

As part of the workshop we worked in groups and created a completely made up story. Every person in the group was contributing couple of sentences from their own experiences, trying to describe the particular step in the hero's journey. At the end we told a story from the point of view of a single hero which was a complete mash-up of moments that had nothing to do with each other. Still it sounded very realistic and plausible.

Storytelling techniques

SUCCESS means Simple, Unexpected, Concrete, Credible, Emotional, Stories. To use this technique find the core of your idea, grab people's attention by surprising them and make sure the idea can be understood and remembered later. Find a way to make people believe in the idea so they can test it for themselves, make them feel something to understand why this idea is important. Tell stories and empower people to use an idea through narrative.

STAR means Something They will Always Remember. A STAR Moment should be Simple, Transferable, Audience-centered, Repeatable, and Meaningful. There are 5 types of STAR moments: memorable dramatization, repeatable sound bites, evocative visuals, emotive storytelling, shocking statistics.

To enhance our stories and presentations we should appeal to senses (smell, sounds, sight, touch, taste) and make it visual.

I will be using some of these techniques combined with others in my future presentations and workshops. I'd love to be able to summarize all of them into a short guide targeted at IT professionals but I don't know if this is even possible.

Anyway if you do try some of these techniques in your public speaking please let me know how it goes. I want to hear what works for you and your audience and what doesn't.

Thanks for reading and happy testing!

There are comments.

More tests for login forms

"Telenor's login form"

By now I probably have documented more test cases for login forms than anyone else. You can check out my previous posts on the topic here and here. I give you a few more examples.

Test 01 and 02: First of all let's start by saying that a "Remember me" checkbox should actually remember the user and login them automatically on the next visit if checked. The other way around if not checked. I don't think this has been mentioned previously!

Test 03: When there is a "Remember me" checkbox it should be selectable both with the mouse and the keyboard. On my.telenor.bg the checkbox changes its image only when clicked with the mouse. Also clicking the login button with Space doesn't work!

Interestingly enough when I don't select "Remember me" at all and close then revisit the page I am still able to access the internal pages of my account! At this point I'm not quite sure what this checkbox does!

Test 04: Testing two factor authentication. I had the case where GitHub SMS didn't arrive for over 24 hrs and I wasn't able to login. After requesting a new code you can see the UI updating but I didn't receive another message. In this particular case I received only one message with an already invalid code. So test for:

  • how long does it take for the codes to expire
  • is there a visual feedback indicating how many codes have been requested
  • do latest code invalidates all the previous ones or all that have been unused still work
  • what happens if I'm already logged in and somebody tries to access my account requesting additional codes which may or may not invalidate my login session?

Test 05: Check that confirmation codes, links, etc will actually expire after their configured time. Kiwi TCMS had this problem which has been fixed in version 3.32.

Test 06: Is this a social-network-login only site? Then which of my profiles did I use? Check that there is a working social auth provider reminder.

Test 07: Check that there is an error message visible (e.g. wrong login credentials). After the redesign Kiwi TCMS had stopped displaying this message and instead presents the user with the login form again!

Also checkout these testing challenges by Claudiu Draghia where you can see many cases related to input field validation! For example empty field, value too long, special characters in field, etc. All of these can lead to issues depending on how login is implemented.

Thanks for reading and happy testing!

There are comments.

Xiaomi's selfie bug

Recently I've been exploring the user interface of a Xiaomi Redmi Note 4X phone and noticed a peculiar bug, adding to my collection of obscure phone bugs. Sometimes when taking selfies the images will not be saved in the correct orientation. Instead they will be saved as if looking in the mirror and this is a bug!

"Samsung S5 front screen"

While taking the selfie the display correctly acts as a mirror, see my personal Samsung S5 (black) and the Xiaomi device (white).

"Xiaomi front screen"

However when the image is saved and then viewed through the gallery application there is a difference. The image below is taken with the Xiaomi device and there have been no effects added to it except scaling and cropping. As you can see the letters on the cereal box are mirrored!

"Xiaomi mirrored image"

The symptoms of the bug are not quite clear as of yet. I've managed to reproduce at around 50% rate so far. I've tried taking pictures during the day in direct sunlight and in the shade, also in the evening under bad artificial lighting. Taking photo of a child's face and then child plus varying number of adults. Then photo of only 1 or more adults, heck I even made a picture of myself. I though that lighting or the number of faces and their age have something to do with this bug but so far I'm not getting consistent results. Sometimes the images turn out OK and other times they don't regardless of what I take a picture of.

I also took a picture of the same cereal box, under the same conditions as above but not capturing the child's face and the image came out not mirrored. The only clue that seems to hold true so far is that you need to have people's faces in the picture for this bug to reproduce but that isn't an edge case when taking selfies, right?

I've also compared the results with my Samsung S5 (Android version 6.0.1) and BlackBerry Z10 devices and both work as expected: while taking the picture the display acts as a mirror but when viewing the saved image it appears in normal orientation. On S5 there is also a clearly visible "Processing" progress bar while the picture is being saved!

For reference the system information is below:

Model number: Redmi Note 4X
Android version: 6.0 MRA58K
Android security patch level: 2017-03-01
Kernel version: 3.18.22+

I'd love if somebody from Xiaomi's engineering department looks into this and sends me a root cause analysis of the problem.

Thanks for reading and happy testing! Oh and btw this is my breakfast, not hers!

There are comments.

Speeding up Rust builds inside Docker

Currently it is not possible to instruct cargo, the Rust package manager, to build only the dependencies of the software you are compiling! This means you can't easily pre-install build dependencies. Luckily you can workaround this with cargo build -p! I've been using this Python script to parse Cargo.toml:

parse-cargo-toml.py
#!/usr/bin/env python

from __future__ import print_function

import os
import toml

_pwd = os.path.dirname(os.path.abspath(__file__))
cargo = toml.loads(open(os.path.join(_pwd, 'Cargo.toml'), 'r').read())

for section in ['dependencies', 'dev-dependencies']:
    for dep, version in cargo[section].items():
        print('cargo build -p %s' % dep)

and then inside my Dockerfile:

RUN mkdir /bdcs-api-rs/
COPY parse-cargo-toml.py /bdcs-api-rs/

# Manually install cargo dependencies before building
# so we can have a reusable intermediate container.
# This workaround is needed until cargo can do this by itself:
# https://github.com/rust-lang/cargo/issues/2644
# https://github.com/rust-lang/cargo/pull/3567
COPY Cargo.toml /bdcs-api-rs/
WORKDIR /bdcs-api-rs/
RUN python ./parse-cargo-toml.py | while read cmd; do \
        $cmd;                                    \
    done

It doesn't take into account the version constraints specified in Cargo.toml but is still able to produce an intermediate docker layer which I can use to speed-up my tests by caching the dependency compilation part.

As seen in the build log, lines 1173-1182, when doing cargo build it downloads and compiles chrono v0.3.0 and toml v0.3.2. The rest of the dependencies are already available. The logs also show that after Job #285 the build times dropped from 16 minutes down to 3-4 minutes due to Docker caching. This would be even less if the cache is kept locally!

Thanks for reading and happy testing!

There are comments.

Code coverage from Nightmare.js tests

In this article I'm going to walk you through the steps required to collect code coverage when running an end-to-end test suite against a React.js application.

The application under test looks like this

<!doctype html>
<html lang="en-us" class="layout-pf layout-pf-fixed">
  <head>
    <!-- js dependencies skipped -->
  </head>
  <body>
    <div id="main"></div>
    <script src="./dist/main.js?0ca4cedf3884d3943762"></script>
  </body>
</html>

It is served as an index.html file and a main.js file which intercepts all interactions from the user and sends requests to the backend API when needed.

There is an existing unit-test suite which loads the individual components and tests them in isolation. Apparently people do this!

There is also an end-to-end test suite which does the majority of the testing. It fires up a browser instance and interacts with the application. Everything runs inside Docker containers providing a full-blown production-like environment. They look like this

test('should switch to Edit Recipe page - recipe creation success', (done) => {
  const nightmare = new Nightmare();
  nightmare
    .goto(recipesPage.url)
    .wait(recipesPage.btnCreateRecipe)
    .click(recipesPage.btnCreateRecipe)
    .wait(page => document.querySelector(page.dialogRootElement).style.display === 'block'
      , createRecipePage)
    .insert(createRecipePage.inputName, createRecipePage.varRecName)
    .insert(createRecipePage.inputDescription, createRecipePage.varRecDesc)
    .click(createRecipePage.btnSave)
    .wait(editRecipePage.componentListItemRootElement)
    .exists(editRecipePage.componentListItemRootElement)
    .end() // remove this!
    .then((element) => {
      expect(element).toBe(true);
      // here goes coverage collection helper
      done(); // remove this!
    });
}, timeout);

The browser interaction is handled by Nightmare.js (sort of like Selenium) and the test runner is Jest.

Code instrumentation

The first thing we need is to instrument the application code to provide coverage statistics. This is done via babel-plugin-istanbul. Because unit-tests are executed a bit differently we want to enable conditional instrumentation. In reality for unit tests we use jest --coverage which enables istanbul on the fly and having the code already instrumented breaks this. So I have the following in webpack.config.js

if (process.argv.includes('--with-coverage')) {
  babelConfig.plugins.push('istanbul');
}

and then build my application with node run build --with-coverage.

You can execute node run start --with-coverage, open the JavaScript console in your browser and inspect the window.__coverage__ variable. If this is defined then the application is instrumented correctly.

Fetching coverage information from within the tests

Remember that main.js from the beginning of this post? It lives inside index.html which means everything gets downloaded to the client side and executed there. When running the end-to-end test suite that is the browser instance which is controlled via Nightmare. You have to pass window.__coverage__ from the browser scope back to nodejs scope via nightmare.evaluate()! I opted to directly save the coverage data on the file system and make it available to coverage reporting tools later!

My coverage collecting snippet looks like this

nightmare
  .evaluate(() => window.__coverage__) // this executes in browser scope
  .end() // terminate the Electron (browser) process
  .then((cov) => {
    // this executes in Node scope
    // handle the data passed back to us from browser scope
    const strCoverage = JSON.stringify(cov);
    const hash = require('crypto').createHmac('sha256', '')
      .update(strCoverage)
      .digest('hex');
    const fileName = `/tmp/coverage-${hash}.json`;
    require('fs').writeFileSync(fileName, strCoverage);

    done(); // the callback from the test
  })
.catch(err => console.log(err));

Nightmare returns window.__coverage__ from browser scope back to nodejs scope and we save it under /tmp using a hash value of the coverage data as the file name.

Side note: I do have about 40% less coverage files than number of test cases. This means some test scenarios exercise the same code paths. Storing the individual coverage reports under a hashed file name makes this very easy to see!

Note that in my coverage handling code I also call .end() which will terminate the browser instance and also execute the done() callback which is being passed as parameter to the test above! This is important because it means we had to update the way tests were written. In particular the Nightmare method sequence doesn't have to call .end() and done() except in the coverage handling code. The coverage helper must be the last code executed inside the body of the last .then() method. This is usually after all assertions (expectations) have been met!

Now this coverage helper needs to be part of every single test case so I wanted it to be a one line function, easy to copy&paste! All my attempts to move this code inside a module have been futile. I can get the module loaded but it kept failing with Unhandled promise rejection (rejection id: 1): cov_23rlop1885 is not defined;`

At the end I've resorted to this simple hack

eval(fs.readFileSync('utils/coverage.js').toString());

Shout-out to Krasimir Tsonev who joined me on a two days pairing session to figure this stuff out. Too bad we couldn't quite figure it out. If you do please send me a pull request!

Reporting the results

All of these coverage-*.json files are directly consumable by nyc - the coverage reporting tool that comes with the Istanbul suite! I mounted .nyc_output/ directly under /tmp inside my Docker container so I could

nyc report
nyc report --reporter=lcov | codecov

We can also modify the unit-test command to jest --coverage --coverageReporters json --coverageDirectory .nyc_output so it produces a coverage-final.json file for nyc. Use this if you want to combine the coverage reports from both test suites.

Because I'm using Travis CI the two test suites are executed independently and there is no easy way to share information between them. Instead I've switched from Coveralls to CodeCov which is smart enough to merge coverage submissions coming from multiple jobs on the same git commits. You can compare the commit submitting only unit-test results with the one submitting coverage from both test suites.

All of the above steps are put into practice in PR #136 if you want to check them out!

Thanks for reading and happy testing!

There are comments.

Faster Travis CI tests with Docker cache

For a while now I've been running tests on Travis CI using Docker containers to build the project and execute the tests inside. In this post I will explain how to speed up execution times.

A Docker image is a filesystem snapshot similar to a virtual machine image. From these images we build containers (e.g. we run the container X from the image Y). The construction of Docker images is controlled via Dockerfile which contains a set of instructions how to build the image. For example:

FROM welder/web-nodejs:latest
MAINTAINER Brian C. Lane <bcl@redhat.com>
RUN dnf install -y nginx

CMD nginx -g "daemon off;"
EXPOSE 3000

## Do the things more likely to change below here. ##

COPY ./docker/nginx.conf /etc/nginx/

# Update node dependencies only if they have changed
COPY ./package.json /welder/package.json
RUN cd /welder/ && npm install

# Copy the rest of the UI files over and compile them
COPY . /welder/
RUN cd /welder/ && node run build

COPY entrypoint.sh /usr/local/bin/entrypoint.sh
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

docker build is smart enough to actually build intermediate layers for each command and store them on your computer. Each command is hashed and it is rebuilt only if it has been changed. Thus the stuff which doesn't change often goes first (like setting up a web server or a DB) and the stuff that changes (like the project source code) goes at the end. All of this is beautifully explained by Stefan Kanev in this video (in Bulgarian).

Travis and Docker

While intermediate layer caching is a standard feature for Docker it is disabled by default in Travis CI and any other CI service I was able to find. To be frank Circles CI offer this as a premium feature but their pricing plans on that aren't clear at all.

However you can enable the use of caching following a few simple steps:

  1. Make your Docker images publicly available (e.g. Docker Hub or Amazon EC2 Container Service)
  2. Before starting the test job do a docker pull my/image:latest
  3. When building your Docker images in Travis add --cache-from my/image:latest to docker build
  4. After successful execution docker tag the latest image with the build job number and docker push it again to the hub!

NOTES:

  • Everything you do will become public so take care not to expose internal code. Alternatively you may configure a private docker registry (e.g. Amazon EC2 CS) and use encrypted passwords for Travis to access your images;
  • docker pull will download all layers that it needs. If your hosting is slow this will negatively impact execution times;
  • docker push will upload only the layers that have been changed;
  • I only push images coming from the master branch which are not from a pull request build job. This prevents me from accidentally messing something up.

If you examine the logs of Job #247.4 and Job #254.4 you will notice that almost all intermediate layers were re-used from cache:

Step 3/12 : RUN dnf install -y nginx
 ---> Using cache
 ---> 25311f052381
Step 4/12 : CMD nginx -g "daemon off;"
 ---> Using cache
 ---> 858606811c85
Step 5/12 : EXPOSE 3000
 ---> Using cache
 ---> d778cbbe0758
Step 6/12 : COPY ./docker/nginx.conf /etc/nginx/
 ---> Using cache
 ---> 56bfa3fa4741
Step 7/12 : COPY ./package.json /welder/package.json
 ---> Using cache
 ---> 929f20da0fc1
Step 8/12 : RUN cd /welder/ && npm install
 ---> Using cache
 ---> 68a30a4aa5c6

Here the slowest operations are dnf install and npm install which on normal execution will take around 5 minutes.

You can check-out my .travis.yml for more info.

First time cache

It is important to note that you need to have your docker images available in the registry before you execute the first docker pull from CI. I do this by manually building the images on my computer and uploading them before configuring CI integration. Afterwards the CI system takes care of updating the images for me.

Initially you may not notice a significant improvement as seen in Job #262, Step 18/22. The initial image available on Docker Hub has all the build dependencies installed and the code has not been changed when job #262 was executed.

The COPY command copies the entire contents of the directory, including filesystem metadata! Things like uid/gid (file ownership), timestamps (not sure if taken into account) and/or extended attributes (e.g. SELinux) will cause the intermediate layers checksums to differ even though the actual source code didn't change. This will resolve itself once your CI system starts automatically pushing the latest images to the registry.

Thanks for reading and happy testing!

There are comments.

TransactionManagementError during testing with Django 1.10

During the past 3 weeks I've been debugging a weird error which started happening after I migrated KiwiTestPad to Django 1.10.7. Here is the reason why this happened.

Symptoms

After migrating to Django 1.10 all tests appeared to be working locally on SQLite however they failed on MySQL with

TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.

The exact same test cases failed on PostgreSQL with:

InterfaceError: connection already closed

Since version 1.10 Django executes all tests inside transactions so my first thoughts were related to the auto-commit mode. However upon closer inspection we can see that the line which triggers the failure is

self.assertTrue(users.exists())

which is essentially a SELECT query aka User.objects.filter(username=username).exists()!

My tests were failing on a SELECT query!

Reading the numerous posts about TransactionManagementError I discovered it may be caused by a run-away cursor. The application did use raw SQL statements which I've converted promptly to ORM queries, that took me some time. Then I also fixed a couple of places where it used transaction.atomic() as well. No luck!

Then, after numerous experiments and tons of logging inside Django's own code I was able to figure out when the failure occurred and what events were in place. The test code looked like this:

response = self.client.get('/confirm/')

user = User.objects.get(username=self.new_user.username)
self.assertTrue(user.is_active)

The failure was happening after the view had been rendered upon the first time I do a SELECT against the database!

The problem was that the connection to the database had been closed midway during the transaction!

In particular (after more debugging of course) the sequence of events was:

  1. execute django/test/client.py::Client::get()
  2. execute django/test/client.py::ClientHandler::__call__(), which takes care to disconnect/connect signals.request_started and signals.request_finished which are responsible for tearing down the DB connection, so problem not here
  3. execute django/core/handlers/base.py::BaseHandler::get_response()
  4. execute django/core/handlers/base.py::BaseHandler::_get_response() which goes through the middleware (needless to say I did inspect all of it as well since there have been some changes in Django 1.10)
  5. execute response = wrapped_callback() while still inside BaseHandler._get_response()
  6. execute django/http/response.py::HttpResponseBase::close() which looks like

    # These methods partially implement the file-like object interface.
    # See https://docs.python.org/3/library/io.html#io.IOBase
     
    # The WSGI server must call this method upon completion of the request.
    # See http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html
    def close(self):
        for closable in self._closable_objects:
            try:
                closable.close()
            except Exception:
                pass
        self.closed = True
        signals.request_finished.send(sender=self._handler_class)
    
  7. signals.request_finished is fired

  8. django/db/__init__.py::close_old_connections() closes the connection!

IMPORTANT: On MySQL setting AUTO_COMMIT=False and CONN_MAX_AGE=None helps workaround this problem but is not the solution for me because it didn't help on PostgreSQL.

Going back to HttpResponseBase::close() I started wondering who calls this method. The answer was it was getting called by the @content.setter method at django/http/response.py::HttpResponse::content() which is even more weird because we assign to self.content inside HttpResponse::__init__()

Root cause

The root cause of my problem was precisely this HttpResponse::__init__() method or rather the way we arrive at it inside the application.

The offending view last line was

return HttpResponse(Prompt.render(
     request=request,
     info_type=Prompt.Info,
     info=msg,
     next=request.GET.get('next', reverse('core-views-index'))
))

and the Prompt class looks like this

from django.shortcuts import render

class Prompt(object):
    @classmethod
    def render(cls, request, info_type=None, info=None, next=None):
        return render(request, 'prompt.html', {
            'type': info_type,
            'info': info,
            'next': next
        })

Looking back at the internals of HttpResponse we see that

  • if content is a string we call self.make_bytes()
  • if the content is an iterator then we assign it and if the object has a close method then it is executed.

HttpResponse itself is an iterator, inherits from six.Iterator so when we initialize HttpResponse with another HttpResponse object (aka the content) we execute content.close() which unfortunately happens to close the database connection as well.

IMPORTANT: note that from the point of view of a person using the application the HTML content is exactly the same regardless of whether we have nested HttpResponse objects or not. Also during normal execution the code doesn't run inside a transaction so we never notice the problem in production.

The fix of course is very simple, just return Prompt.render()!

Thanks for reading and happy testing!

There are comments.

Producing coverage report for Haskell binaries

Recently I've started testing a Haskell application and a question I find unanswered (or at least very poorly documented) is how to produce coverage reports for binaries ?

Understanding HPC & cabal

hpc is the Haskell code coverage tool. It produces the following files:

  • .mix - module index file, contains information about tick boxes - their type and location in the source code;
  • .tix - tick index file aka coverage report;
  • .pix - program index file, used only by hpc trans.

The invocation to hpc report needs to know where to find the .mix files in order to be able to translate the coverage information back to source and it needs to know the location (full path or relative from pwd) to the tix file we want to report.

cabal is the package management tool for Haskell. Among other thing it can be used to build your code, execute the test suite and produce the coverage report for you. cabal build will produce module information in dist/hpc/vanilla/mix and cabal test will store coverage information in dist/hpc/vanilla/tix!

A particular thing about Haskell is that you can only test code which can be imported, e.g. it is a library module. You can't test (via Hspec or Hunit) code which lives inside a file that produces a binary (e.g. Main.hs). However you can still execute these binaries (e.g. invoke them from the shell) and they will produce a coverage report in the current directory (e.g. main.tix).

Putting everything together

  1. Using cabal build and cabal test build the project and execute your unit tests. This will create the necessary .mix files (including ones for binaries) and .tix files coming from unit testing;
  2. Invoke your binaries passing appropriate data and examining the results (e.g. compare the output to a known value). A simple shell or Python script could do the job;
  3. Copy the binary.tix file under dist/hpc/vanilla/binary/binary.tix!

Produce coverage report with hpc:

hpc markup --hpcdir=dist/hpc/vanilla/mix/lib --hpcdir=dist/hpc/vanilla/mix/binary  dist/hpc/vanilla/tix/binary/binary.tix

Convert the coverage report to JSON and send it to Coveralls.io:

cabal install hpc-coveralls
~/.cabal/bin/hpc-coveralls --display-report tests binary

Example

Check out the haskell-rpm repository for an example. See job #45 where there is now coverage for the inspect.hs, unrpm.hs and rpm2json.hs files, producing binary executables. Also notice that in RPM/Parse.hs the function parseRPMC is now covered, while it was not covered in the previous job #42!

.travis.yml snippet
script:
  - ~/.cabal/bin/hlint .
  - cabal install --dependencies-only --enable-tests
  - cabal configure --enable-tests --enable-coverage --ghc-option=-DTEST
  - cabal build
  - cabal test --show-details=always

  # tests to produce coverage for binaries
  - wget https://s3.amazonaws.com/atodorov/rpms/macbook/el7/x86_64/efivar-0.14-1.el7.x86_64.rpm
  - ./tests/test_binaries.sh ./efivar-0.14-1.el7.x86_64.rpm

  # move .tix files in appropriate directories
  - mkdir ./dist/hpc/vanilla/tix/inspect/ ./dist/hpc/vanilla/tix/unrpm/ ./dist/hpc/vanilla/tix/rpm2json/
  - mv inspect.tix ./dist/hpc/vanilla/tix/inspect/
  - mv rpm2json.tix ./dist/hpc/vanilla/tix/rpm2json/
  - mv unrpm.tix ./dist/hpc/vanilla/tix/unrpm/

after_success:
  - cabal install hpc-coveralls
  - ~/.cabal/bin/hpc-coveralls --display-report tests inspect rpm2json unrpm

Thanks for reading and happy testing!

There are comments.

What's the bug in this pseudo-code

Rails Girls Vratsa sticker

This is one of the stickers for the second edition of Rails Girls Vratsa which was held yesterday. Let's explore some of the bug proposals submitted by the Bulgarian QA group:

  1. sad() == true is ugly
  2. sad() is not very nice, better make it if(isSad())
  3. use sadStop(), and even better - stopSad()
  4. there is an extra space character in beAwesome( )
  5. the last curly bracket needs to be on a new line

Lyudmil Latinov

My friend Lu describes what I would call style issues. The style he refers to is mostly Java oriented, especially with naming things. In Ruby we would probably go with sad? instead of isSad. Style is important and there are many tools to help us with that this will not cause a functional problem! While I'm at it let me say the curly brackets are not the problem either. They are not valid in Ruby this is a pseudo-code and they also fall in the style category.

The next interesting proposal comes from Tsveta Krasteva. She examines the possibility of sad() returning an object or nil instead of boolean value. Her first question was will the if statement still work, and the answer is yes. In Ruby everything is an object and every object can be compared to true and false. See Alan Skorkin's blog post on the subject.

Then Tsveta says the answer is to use sad().stop() with the warning that it may return nil. In this context the sad() method returns on object indicating that the person is feeling sad. If the method returns nil then the person is feeling OK.

example by Tsveta
class Csad
  def stop()
    print("stop\n");
  end
end

def sad()
  print("sad\n");
  Csad.new();
end

def beAwesome()
  print("beAwesome\n");
end

# notice == true was removed
if(sad())
  print("Yes, I am sad\n");
  sad.stop();
  beAwesome( );
end

While this is coming closer to a functioning solution something about it is bugging me. In the if statement the developer has typed more characters than required (== true). This sounds to me unlikely but is possible with less experienced developers. The other issue is that we are using an object (of class Csad) to represent an internal state in the system under test. There is one method to return the state (sad()) and another one to alter the state (Csad.stop()). The two methods don't operate on the same object! Not a very strong OOP design. On top of that we have to call the method twice, first time in the if statement, the second time in the body of the if statement, which may have unwanted side effects. It is best to assign the return value to some variable instead.

IMO if we are to use this OOP approach the code should look something like:

class Person
  def sad?()
  end

  def stopBeingSad()
  end

  def beAwesome()
  end
end

p = Person.new
if p.sad?
    p.stopBeingSad
    p.beAwesome
end

Let me return back to assuming we don't use classes here. The first obvious mistake is the space in sad stop(); first spotted by Peter Sabev*. His proposal, backed by others is to use sad.stop(). However they didn't use my hint asking what is the return value of sad() ?

If sad() returns boolean then we'll get undefined method 'stop' for true:TrueClass (NoMethodError)! Same thing if sad() returns nil, although we skip the if block in this case.

In Ruby we are allowed to skip parentheses when calling a method, like I've shown above. If we ignore this fact for a second, then sad?.stop() will mean execute the method named stop() which is a member of the sad? variable, which is of type method! Again, methods don't have an attribute named stop!

The last two paragraphs are the semantic/functional mistake I see in this code. The only way for it to work is to use an OOP variant which is further away from what the existing clues give us.

Note: The variant sad? stop() is syntactically correct. This means call the function sad? with parameter the result of calling the method stop(), which depending on the outer scope of this program may or may not be correct (e.g. stop is defined, sad? accepts optional parameters, sad? maintains global state).

Thanks for reading and happy testing!

There are comments.

VMware's favorite login form

How do you test a login form? is one of my favorite questions when screening candidates for QA positions and also a good brain exercise even for experienced testers. I've written about it last year. In this blog post I'd like to share a different perspective on this same question, this time courtesy of my friend Rayna Stankova.

Login form

What bugs do you see above

The series of images above is from a Women Who Code Sofia workshop where the participants were given printed copies and asked to find as much defects as possible. Here they are (counting clock-wise from the top-left corner):

  1. Typo in "Registr" link at the bottom;
  2. UI components are not aligned;
  3. Missing "Forgot your password?" link
  4. Backend credentials validation with empty password; plain text password field; Too specific information about incorrect credentials;
  5. Too specific information about incorrect credentials with visual hint as to what exactly is not correct. In this case it looks like the password is OK, maybe it was one of the 4 most commonly used passwords, but the username is wrong which we can easily figure out;
  6. In this case the error handling appears to be correct, not disclosing what exactly is wrong. The placement is somewhat wrong, it looks like an error message for one of the fields instead for the entire form. I'd move that to the top and even slightly update the wording to be more like Login failed, bad credentials, try again.

How do you test this

Here is a list of possible test scenarios, proposed by Rayna. Notes are mine.

UI Layer

  • Test 1: Verify Email (User ID) field has focus on page load
  • Test 2: Verify Empty Email (User ID) field and Password field
  • Test 3: Verify Empty Email (User ID) field
  • Test 4: Verify Empty Password field
  • Test 5: Verify Correct sign in
  • Test 6: Verify Incorrect sign in
  • Test 7: Verify Password Reset - working link
  • Test 8: Verify Password Reset - invalid emails
  • Test 9: Verify Password Reset - valid email
  • Test 10: Verify Password Reset - using new password
  • Test 11: Verify Password Reset - using old password
  • Test 12: Verify whether password text is hidden
  • Test 13: Verify text field limits - whether the browser accepts more than the allowed database limits
  • Test 14: Verify that validation message is displayed in case user exceeds the character limit of the username and password fields
  • Test 15: Verify if there is checkbox with label "remember password" in the login page
  • Test 16: Verify if it’s allowed the username to contain non printable characters? If not, this is invalid on the 'create user' section.
  • Test 17: Verify if the user must be logged in to access any other area of the site.

Tests 10 and 11 are particularly relevant for Fedora Account System where you need a really strong password and (at least in the past) had to change it more often and couldn't reuse any of your old passwords. As a user I really hate this b/c I can't remember my own password but it makes for a good test scenario.

13 and 14 are also something I rarely see and could make a nice case for property based testing.

16 would have been the bread and butter of testing Emoj.li (the first emoji-only social network).

Keyboard Specific

  • Test 18: Verify Navigate to all fields
  • Test 19: Verify Enter submits on password focus
  • Test 20: Verify Space submits on login focus
  • Test 21: Verify Enter submits

These are all so relevant with beautifully styled websites nowadays. The one I hate the most is when space key doesn't trigger select/unselect for checkboxes which are actually images!

Security:

  • Test 22: Verify SQL Injections testing - password field
  • Test 23: Verify SQL Injections testing - username field
  • Test 24: Verify SQL Injections testing - reset password
  • Test 25: Verify Password/username not visible from URL login
  • Test 26: Verify For security point of view, in case of incorrect credentials user is displayed the message like "incorrect username or password" instead of exact message pointing at the field that is incorrect. As message like "incorrect username" will aid hacker in brute-forcing the fields one by one
  • Test 27: Verify the timeout of the login session
  • Test 28: Verify if the password can be copy-pasted or not
  • Test 29: Verify that once logged in, clicking back button doesn't logout user

22, 23 and 24 are a bit generic and I guess can be collapsed into one. Better yet make them more specific instead.

Test 28 may sound like nonsense but is not. I remember back in the days that it was possible to copy and paste the password out of Windows dial-up credentials screen. With heavily styled form fields it is possible to have this problem again so it is a valid check IMO.

Others:

  • Test 30: Verify that the password is in encrypted form when entered
  • Test 31: Verify the user must be logged in to call any web services.
  • Test 32: Verify if the username is allowed to contain non printable characters, the code handling login can deal with them and no error is thrown.

I think Test 30 means to validate that the backend doesn't store passwords in plain text but rather stores their hashes.

32 is a duplicate of 16. I also say why only the username? Password field is also a good candidate for this.

If you check how I would test a login form you will find some similarities but there are also scenarios which are different. I'm interested to see what other scenarios we've both missed, especially ones which have manifested themselves as bugs in actual applications.

Thanks for reading and happy testing!

There are comments.

Monitoring behavior via automated tests

In my last several presentations I briefly talked about using your tests as a monitoring tool. I've not been eating my own dog food and stuff failed in production!

What is monitoring via testing

This is a technique I coined 6 months ago while working with Tradeo's team. I'm not the first one to figure this out so if you know the proper name for it please let me know in the comments. So why not take a subset of your automated tests and run them regularly against production? Let's say every hour?

In my particular case we started with integration tests which interact with the product (a web app) in a way that a living person would do. E.g. login, update their settings, follow another user, chat with another user, try to deposit money, etc. The results from these tests are logged into a database and then charted (using Grafana). This way we can bring lots of data points together and easily analyze them.

This technique has the added bonus that we can cover the most critical test paths in a couple of minutes and do so regularly without human intervention. Perusing the existing monitoring infrastructure of the devops team we can configure alerts if need be. This makes it sort of early detection/warning system plus it gives a degree of possibility to spot correlations between data points or patterns.

As simple as it sounds I've heard about a handfull of companies doing this sort of continuous testing against production. Maybe you can implement something similar in your organization and we can talk more about the results?

Why does it matter

Anyway, everyone knows how to write Selenium tests so I'm not going to bother you with the details. Why does this kind of testing matter?

Do you remember a recent announcement by GitHub about Travis CI leaking some authentication tokens into their public log files? I did receive an email about this but didn't pay attention to it because I don't use GitHub tokens for anything I do in Travis. However as a safety measure GitHub had went ahead and wiped out my security tokens.

The result from this is that my automated upstream testing infrastructure had stopped working! In particular my requests to the GitHub API stopped working. And I didn't even know about it!

This means that since May 24th there have been at least 4 new versions of libraries and frameworks on which some of my software depends and I failed to test them! One of them was Django 1.11.2.

I have supplied a new GitHub token for my infra but if I had monitoring I would have known about this problem well in advance. Next I'm off to write some monitoring tests and also implement better failure detection in Strazar itself!

Thanks for reading and happy testing (in production)!

There are comments.

Semantically Invalid Input

Unsolvable Square example
. . 9 | . 2 8 | 7 . .
8 . 6 | . . 4 | . . 5
. . 3 | . . . | . . 4
------+-------+------
6 . . | . . . | . . .
? 2 . | 7 1 3 | 4 5 .
. . . | . . . | . . 2
------+-------+------
3 . . | . . . | 5 . .
9 . . | 4 . . | 8 . 7
. . 1 | 2 5 . | 3 . .

In a comment to a previous post Flavio Poletti proposed a very interesting test case for a function which solves the Sudoku game - semantically invalid input, i.e. an input that passes intermediate validation checks (no duplicates in any row/col/9-square) but that cannot possibly have a solution.

Until then I thought that Sudoku was a completely deterministic game and if input followed all validation checks then we always have a solution. Apparently I was wrong! Reading more on the topic I discovered these Sudoku test cases from Sudopedia. Their Invalid Test Cases section lists several examples of semantically invalid input in Sudoku:

  • Unsolvable Square;
  • Unsolvable Box;
  • Unsolvable Column;
  • Unsolvable Row;
  • Not Unique with examples having 2, 3, 4, 10 and 125 solutions

The example above cannot be solved because the left-most square of the middle row (r5c1) has no possible candidates.

Following the rule non-repeating numbers from 1 to 9 in each row for row 5 we're left with numbers: 6, 8 and 9. For (r5c1) 6 is a no-go because it is already present in the same square. Then 9 is a no-go because it is present in column 1. Which leaves us with 8, which is also present in column 1! Pretty awesome, isn't it?

Also check the Valid Test Cases section which includes other interesting examples and definitely not ones which I have considered previously when testing Sudoku.

On a more practical note I have been trying to remember a case from my QA practice where we had input data that matched all conditions but is semantically invalid. I can't remember of such a case. If you do have examples about semantically invalid data in real software please let me know in the comments below!

Thanks for reading and happy testing!

There are comments.

Learn Python & Selenium Automation in 8 weeks

Couple of months ago I conducted a practical, instructor lead training in Python and Selenium automation for manual testers. You can find the materials at GitHub.

The training consists of several basic modules and practical homework assignments. The modules explain

  1. The basic structure of a Python program and functions
  2. Commonly used data types
  3. If statements and (for) loops
  4. Classes and objects
  5. The Python unit testing framework and its assertions
  6. High-level introduction to Selenium with Python
  7. High-level introduction to the Page Objects design pattern
  8. Writing automated tests for real world scenarios without any help from the instructor.

Every module is intended to be taken in the course of 1 week and begins with links to preparatory materials and lots of reading. Then I help the students understand the basics and explain with more examples, often writing code as we go along. At the end there is the homework assignment for which I expect a solution presented by the end of the week so I can comment and code-review it.

All assignments which require the student to implement functionality, not tests, are paired with a test suite, which the student should use to validate their solution.

What worked well

Despite everything I've written below I had 2 students (from a group of 8) which showed very good progress. One of them was the absolute star, taking active participation in every class and doing almost all homework assignments on time, pretty much without errors. I think she'd had some previous training or experience though. She was in the USA, training was done remotely via Google Hangouts.

The other student was in Sofia, training was done in person. He is not on the same level as the US student but is the best from the Bulgarian team. IMO he lacks a little bit of motivation. He "cheated" a bit on some tasks providing non-standard, easier solutions and made most of his assignments. After the first Selenium session he started creating small scripts to extract results from football sites or as helpers to be applied in the daily job. The interesting fact for me was that he created his programs as unittest.TestCase classes. I guess because this was the way he knew how to run them!?!

There were another few students which had had some prior experience with programming but weren't very active in class so I can't tell how their careers will progress. If they put some more effort into it I'm sure they can turn out to have decent programming skills.

What didn't work well

Starting from the beginning most students failed to read the preparatory materials. Some of the students did read a little bit, others didn't read at all. At the times when they came prepared I had the feeling the sessions progressed more smoothly. I also had students joining late in the process, which for the most part didn't participate at all in the training. I'd like to avoid that in the future if possible.

Sometimes students complained about lack of example code, although Dive into Python includes tons of examples. I've resorted to sending them the example.py files which I produced during class.

The practical part of the training was mostly myself programming on a big TV screen in front of everyone else. Several times someone from the students took my place. There wasn't much active participation on their part and unfortunately they didn't want to bring personal laptops to the training (or maybe weren't allowed)! We did have a company provided laptop though.

When practicing functions and arithmetic operations the students struggled with basic maths like breaking down a number into its digits or vice versa, working with Fibonacci sequences and the like. In some cases they cheated by converting to/from strings and then iterating over them. Also some hard-coded the first few numbers of the Fibonacci sequence and returned it directly. Maybe an in-place explanation of the underlying maths would have been helpful but honestly I was surprised by this. Somebody please explain or give me an advise here!

I am completely missing examples of the datetime and timedelta classes which tuned out to be very handy in the practical Selenium tasks and we had to go over them on the fly.

The OOP assignments went mostly undone, not to mention one of them had bonus tasks which are easily solved using recursion. I think we could skip some of the OOP practice (not sure how safe that is) because I really need classes only for constructing the tests and we don't do anything fancy there.

Page Object design pattern is also OOP based and I think that went somewhat well granted that we are only passing values around and performing some actions. I didn't put constraints nor provided guidance on what the classes should look like and which methods go where. Maybe I should have made it easier.

Anyway, given that Page Objects is being replaced by Screenplay pattern, I think we can safely stick to the all-in-one functional based Selenium tests. Maybe utilize helper functions for repeated tasks (like login). Indeed this is what I was using last year with Rspec & Capybara!

What students didn't understand

Right until the end I had people who had troubles understanding function signatures, function instances and calling/executing a function. Also returning a value from a function vs. printing the (same) value on screen or assigning to the same global variable (e.g. FIB_NUMBERS).

In the same category falls using method parameters vs. using global variables (which happened to have the same value), using the parameters as arguments to another function inside the body of the current function, using class attributes (e.g. self.name) to store and pass values around vs. local variables in methods vs. method parameters which have the same names.

I think there was some confusion about lists, dictionaries and tuples but we did practice mostly with list structures so I don't have enough information.

I have the impression that object oriented programming (classes and instances, we didn't go into inheritance) are generally confusing to beginners with zero programming experience. The classical way to explain them is by using some abstractions like animal -> dog -> a particular dog breed -> a particular pet. OOP was explained to me in a similar way back in school so these kinds of abstractions are very natural for me. I have no idea if my explanation sucks or students are having hard time wrapping their heads around the abstraction. I'd love to hear some feedback from other instructors on this one.

I think there is some misunderstanding between a class (a definition of behavior) and an instance/object of this class (something which exists into memory). This may also explain the difficulty remembering or figuring out what self points to and why do we need to use it inside method bodies.

For unittest.TestCase we didn't do lots of practice which is my fault. The homework assignments request the students to go back to solutions of previous modules and implement more tests for them. Next time I should provide a module (possibly with non-obvious bugs) and request to write a comprehensive test suite for it.

Because of the missing practice there was some confusion/misunderstanding about the setUpClass/tearDownClass and the setUp/tearDown methods. Also add to the mix that the first are @classmethod while the later are not. "To be safe" students always defined both as class methods!

I have since corrected the training materials but we didn't have good examples (nor practiced) explaining the difference between setUpClass (executed once aka before suite) and setUp (possibly executed multiple times aka before test method).

On the Selenium side I think it is mostly practice which students lack, not understanding. The entire Selenium framework (any web test framework for that matter) boils down to

  • Load a page
  • Find element(s)
  • Click or hover (that one was tricky) element
  • Get element's attribute value or text
  • Wait for the proper page to load (or worst case AJAX calls)

IMO finding the correct element on the page is on-par with waiting (which also relies on locating elements) and took 80% of the time we spent working with Selenium.

Thanks for reading and don't forget to comment and give me your feedback!

Image source: https://www.udemy.com/selenium-webdriver-with-python/

There are comments.

Quality Assurance According 2 Einstein

logo

Quality Assurance According 2 Einstein is a talk which introduces several different ideas about how we need to think and approach software testing. It touches on subjects like mutation testing, pairwise testing, automatic test execution, smart test non-execution, using tests as monitoring tools and team/process organization.

Because testing is more thinking than writing I have chosen a different format for this presentation. It contains only slides with famous quotes from one of the greatest thinkers of our time - Albert Einstein!

This blog post includes the accompanying links and references only! It is my first iteration on the topic so expect it to be unclear and incomplete, use your imagination! I will continue working and presenting on the same topic in the next few months so you can expect updates from time to time. In the mean time I am happy to discuss with you down in the comments.

IMAGINATION IS MORE IMPORTANT THAN KNOWLEDGE.

THE FASTER YOU GO, THE SHORTER YOU ARE.

IF THE FACTS DON'T FIT THE THEORY, CHANGE THE FACTS.

THE WHOLE OF SCIENCE IS NOTHING MORE THAN A REFINEMENT OF EVERYDAY THINKING.

INSANITY - DOING THE SAME THING OVER AND OVER AND EXPECTING DIFFERENT RESULTS.

This principle can be applied to any team/process within the organization. The above link is reference to a nice book which was recommended to me but the gist of it is that we always need to analyze, ask questions and change is we want to achieve great results. A practicle example of what is possible if you follow this principle is this talk Accelerate Automation Tests From 3 Hours to 3 Minutes.

THE ONLY REASON FOR TIME IS SO THAT EVERYTHING DOESN'T HAPPEN AT ONCE.

The topic here is "using tests as a monitoring tool". This is something I started a while back ago, helping a prominent startup with their production testing but my involvement ended very soon after the framework was deployed live so I don't have lots of insight.

As the first few days this technique identified some unexpected behaviors, for example a 3rd party service was updating very often. Once even they were broken for a few hours - something nobody had information about.

Since then I've heard about 2 more companies using similar techniques to continuously validate that production software continues to work without having a physical person to verify it. In the event of failures there are alerts which are delath with accordingly.

NO PROBLEM CAN BE SOLVED FROM THE SAME LEVEL OF CONSIOUSNESS THAT CREATED IT.

That much must be obvious to us quality engineers. What about the future however?

I don't have anything more concrete here. Just looking towards what is coming next!

DO NOT WORRY ABOUT YOUR DIFFICULTIES IN MATHEMATICS. I CAN ASSURE YOU MINE ARE STILL GREATER.

Thanks for reading and happy testing!

There are comments.

Automatic cargo update & pull requests for Rust projects

If you follow my blog you are aware that I use automated tools to do some boring tasks instead of me. For example they can detect when new versions of dependencies I'm using are available and then schedule testing against them on the fly.

One of these tools is Strazar which I use heavily for my Django based packages. Example: django-s3-cache build job.

Recently I've made a slightly different proof-of-concept for a Rust project. Because rustc and various dependencies (called crates) are updated very often we didn't want to expand the test matrix like Strazar does. Instead we wanted to always build & test against the latest crates versions and if that passes create a pull request for the update (in Cargo.lock). All of this unattended of course!

To start create a cron job in Travis CI which will execute once per day and call your test script. The script looks like this:

cargo-update-and-pr
#!/bin/bash

if [ -z "$GITHUB_TOKEN" ]; then
    echo "GITHUB_TOKEN is not defined"
    exit 1
fi

BRANCH_NAME="automated_cargo_update"

git checkout -b $BRANCH_NAME
cargo update && cargo test

DIFF=`git diff`
# NOTE: we don't really check the result from testing here. Only that
# something has been changed, e.g. Cargo.lock
if [ -n "$DIFF" ]; then
    # configure git authorship
    git config --global user.email "atodorov@MrSenko.com"
    git config --global user.name "Alexander Todorov"

    # add a remote with read/write permissions!
    # use token authentication instead of password
    git remote add authenticated https://atodorov:$GITHUB_TOKEN@github.com/atodorov/bdcs-api-rs.git

    # commit the changes to Cargo.lock
    git commit -a -m "Auto-update cargo crates"

    # push the changes so that PR API has something to compare against
    git push authenticated $BRANCH_NAME

    # finally create the PR
    curl -X POST -H "Content-Type: application/json" -H "Authorization: token $GITHUB_TOKEN" \
         --data '{"title":"Auto-update cargo crates","head":"automated_cargo_update","base":"master", "body":"@atodorov review"}' \
         https://api.github.com/repos/atodorov/bdcs-api-rs/pulls
fi

A few notes here:

  • You need to define a secret GITHUB_TOKEN variable for authentication;
  • The script doesn't force push, but in practice that may be useful (e.g. updating the PR);
  • The script doesn't have any error handling;
  • If PR is still open GitHub will tell us about it but we ignore the result here;
  • DON'T paste this into your Makefile because the GITHUB_TOKEN variable will be expanded into the logs and your secrets go away! Always call the script from your Makefile to avoid revealing secrets.
  • I am using topic branches because this is a POC. Switch to master and maybe move all URLs as variables at the top of the script!
  • I run this cron build against a fork of the project because the team doesn't feel comfortable having automated commits/pushes. I also create the pull requests against my own fork. You will have to adjust the targets if you want your PR to go to the original repository.

Here is the PR which was created by this script: https://github.com/atodorov/bdcs-api-rs/pull/5

Notice that it includes previous commits b/c they have not been merged to the master branch!

Here's the test job (#77) which generated this PR: https://travis-ci.org/atodorov/bdcs-api-rs/builds/219274916

Here's a test job (#87) which bails out miserably because the PR already exists: https://travis-ci.org/atodorov/bdcs-api-rs/builds/220954269

This post is part of my Quality Assurance According to Einstein series - a detailed description of useful techniques I will be presenting very soon.

Thanks for reading and happy testing!

There are comments.

Testing Red Hat Enterprise Linux the Microsoft way

Microsoft and Red Hat logos

Pairwise (a.k.a. all-pairs) testing is an effective test case generation technique that is based on the observation that most faults are caused by interactions of at most two factors! Pairwise-generated test suites cover all combinations of two therefore are much smaller than exhaustive ones yet still very effective in finding defects. This technique has been pioneered by Microsoft in testing their products. For an example please see their GitHub repo!

I heard about pairwise testing by Niels Sander Christensen last year at QA Challenge Accepted 2.0 and I immediately knew where it would fit into my test matrix.

This article describes an experiment made during Red Hat Enterprise Linux 6.9 installation testing campaign. The experiment covers generating a test plan (referred to Pairwise Test Plan) based on the pairwise test strategy and some heuristics. The goal was to reduce the number of test cases which needed to be executed and still maintain good test coverage (in terms of breadth of testing) and also maintain low risk for the product.

Product background

For RHEL 6.9 there are 9 different product variants each comprising of particular package set and CPU architecture:

  • Server i386
  • Server x86_64
  • Server ppc64 (IBM Power)
  • Server s390x (IBM mainframe)
  • Workstation i386
  • Workstation x86_64
  • Client i386
  • Client x86_64
  • ComputeNode x86_64

Traditional testing activities are classified as Tier #1, Tier #2 and Tier #3

  • Tier #1 - basic form of installation testing. Executed for all arch/variants on all builds, including nightly builds. This group includes the most common installation methods and configurations. If Tier #1 fails the product is considered unfit for customers and further testing blocking the release!
  • Tier #2 and #3 - includes additional installation configurations and/or functionality which is deemed important. These are still fairly common scenarios but not the most frequently used ones. If some of the Tier#2 and #3 test cases fail they will not block the release.

This experiment focuses only on Tier #2 and #3 test cases because they generate the largest test matrix! This experiment is related only to installation testing of RHEL. This broadly means "Can the customer install RHEL via the Anaconda installer and boot into the installed system". I do not test functionality of the system after reboot!

I have theorized that from the point of view of installation testing RHEL is mostly a platform independent product!

Individual product variants rarely exhibit differences in their functional behavior because they are compiled from the same code base! If a feature is present it should work the same on all variants. The main differences between variants are:

  • What software has been packaged as part of the variant (e.g. base package set and add-on repos);
  • Whether or not a particular feature is officially supported, e.g. iBFT on Client variants. Support is usually provided via including the respective packages in the variant package set and declaring SLA for it.

These differences may lead to problems with resolving dependencies and missing packages but historically haven't shown significant tendency to cause functional failures e.g. using NFS as installation source working on Server but not on Client.

The main component being tested, Anaconda - the installer, is also mostly platform independent. In a previous experiment I had collected code coverage data from Anaconda while performing installation with the same kickstart (or same manual options) on various architectures. The coverage report supports the claim that Anaconda is platform independent! See Anaconda & coverage.py - Pt.3 - coverage-diff, section Kickstart vs. Kickstart!

Testing approach

The traditional pairwise approach focuses on features whose functionality is controlled via parameters. For example: RAID level, encryption cipher, etc. I have taken this definition one level up and applied it to the entire product! Now functionality is also controlled by variant and CPU architecture! This allows me to reduce the number of total test cases in the test matrix but still execute all of them at least once!

The initial implementation used a simple script, built with the Ruby pairwise gem, that:

  • Copies verbatim all test cases which are applicable for a single product variant, for example s390x Server or ppc64 Server! There's nothing we can do to reduce these from combinatorial point of view!

  • Then we have the group of test cases with input parameters. For example:

    storage / iBFT / No authentication / Network init script
    storage / iBFT / CHAP authentication / Network Manager
    storage / iBFT / Reverse CHAP authentication / Network Manager
    

    In this example the test is storage / iBFT and the parameters are

    • Authentication type
      • None
      • CHAP
      • Reverse CHAP
    • Network management type
      • SysV init
      • NetworkManager

    For test cases in this group I also consider the CPU architecture and OS variant as part of the input parameters and combine them using pairwise. Usually this results in around 50% reduction of test efforts compared to testing against all product variants!

  • Last we have the group of test cases which don't depend on any input parameters, for example partitioning / swap on LVM. They are grouped together (wrt their applicable variants) and each test case is executed only once against a randomly chosen product variant! This is my own heuristic based on the fact that the product is platform independent!

    NOTE: You may think that for these test cases the product variant is their input parameter. If we consider this to be the case then we'll not get any reduction because of how pairwise generation works (the 2 parameters with the largest number of possible values determine the maximum size of the test matrix). In this case the 9 product variants is the largest set of values!

For this experiment pairwise_spec.rb only produced the list of test scenarios (test cases) to be executed! It doesn't schedule test execution and it doesn't update the test case management system with actual results. It just tells you what to do! Obviously this script will need to integrate with other systems and processes as defined by the organization!

Example results:

RHEL 6.9 Tier #2 and #3 testing
  Test case w/o parameters can't be reduced via pairwise
    x86_64 Server - partitioning / swap on LVM
    x86_64 Workstation - partitioning / swap on LVM
    x86_64 Client - partitioning / swap on LVM
    x86_64 ComputeNode - partitioning / swap on LVM
    i386 Server - partitioning / swap on LVM
    i386 Workstation - partitioning / swap on LVM
    i386 Client - partitioning / swap on LVM
    ppc64 Server - partitioning / swap on LVM
    s390x Server - partitioning / swap on LVM
  Test case(s) with parameters can be reduced by pairwise
    x86_64 Server - rescue mode / LVM / plain
    x86_64 ComputeNode - rescue mode / RAID / encrypted
    x86_64 Client - rescue mode / RAID / plain
    x86_64 Workstation - rescue mode / LVM / encrypted
    x86_64 Server - rescue mode / RAID / encrypted
    x86_64 Workstation - rescue mode / RAID / plain
    x86_64 Client - rescue mode / LVM / encrypted
    x86_64 ComputeNode - rescue mode / LVM / plain
    i386 Server - rescue mode / LVM / plain
    i386 Client - rescue mode / RAID / encrypted
    i386 Workstation - rescue mode / RAID / plain
    i386 Workstation - rescue mode / LVM / encrypted
    i386 Server - rescue mode / RAID / encrypted
    i386 Workstation - rescue mode / RAID / encrypted
    i386 Client - rescue mode / LVM / plain
    ppc64 Server - rescue mode / LVM / plain
    s390x Server - rescue mode / RAID / encrypted
    s390x Server - rescue mode / RAID / plain
    s390x Server - rescue mode / LVM / encrypted
    ppc64 Server - rescue mode / RAID / encrypted

Finished in 0.00602 seconds (files took 0.10734 seconds to load)
29 examples, 0 failures

In this example there are 9 (variants) * 2 (partitioning type) * 2 (encryption type) == 32 total combinations! As you can see pairwise reduced them to 20! Also notice that if you don't take CPU arch and variant into account you are left with 2 (partitioning type) * 2 (encryption type) == 4 combinations for each product variant and they can't be reduced on their own!

Acceptance criteria

I did evaluate all bugs which were found by executing the test cases from the pairwise test plan and compared them to the list of all bugs found by the team. This will tell me how good my pairwise test plan was compared to the regular one. "good" meaning:

  • How many bugs would I find if I don't execute the full test matrix
  • How many critical bugs would I miss if I don't execute the full test matrix

Results:

  • Pairwise found 14 new bugs;
  • 23 bugs were first found by regular test plan
    • some by test cases not included in this experiment;
    • pairwise totally missed 4 bugs!

Pairwise test plan missed 3 critical regressions due to:

  • Poor planning of pairwise test activity. There was a regression in one of the latest builds and that particular test was simply not executed!
  • Human factor aka me not being careful enough and not following the process diligently. I waived a test due to infrastructure issues while there was a bug which stayed undiscovered! I should have tried harder to retest this scenario after fixing my infrastructure!
  • Architecture and networking specific regression which wasn't tested on multiple levels and is very narrow corner case. Can be mitigated with more testing upstream, more automation and better understanding of the hidden test requirements (e.g. IPv4 vs IPv6) for all of which pairwise can help (analysis and more available resources).

All of the missed regressions could have been missed by regular test plan as well, however the risk of missing them in pairwise is higher b/c of the reduced test matrix and the fact that you may not execute exactly the same test scenario for quite a long time. OTOH the risk can be mitigated with more automation b/c we now have more free resources.

IMO pairwise test plan did a good job and didn't introduce "dramatic" changes in risk level for the product!

Summary

  • 65 % reduction in test matrix;
  • Only 1/3rd of team engineers needed;
    • keep arch experts around though;
  • 2/3rd of team engineers could be free for automation and to create even more test cases;
  • Test run execution completion rate is comparable to regular test plan
    • average execution completion for pairwise test plan was 76%!
    • average execution completion for regular test plan was 85%!
  • New bugs found:
    • 30% by Pairwise Test Plan
    • 30% by Tier #1 test cases (good job here)
    • 30% by exploratory testing
  • Risk of missing regressions or critical bugs exists (I did miss 3) but can be mitigated;
  • Clearly exposes the need of constant review, analysis and improvement of existing test cases;
  • Exposes hidden parameters in test scenarios and some hidden relationships;
  • Patterns and other optimization techniques observed

Patterns observed:

  • Many new test case combinations found, which I had to describe into Nitrate; The longer you use pairwise the less new combinations are discovered (aka undocumented scenarios). The first 3 initial test runs discovered the most of the missing combinations!
  • Found quite a few test cases with hidden parameters, for example swap / recommended which calculates the recommended size of swap partition based on 4 different ranges in which the actual RAM size fits! These ranges became parameters to the test case;
  • Can combine (2, 3, etc) independent test cases together and consider them as parameters so we can apply pairwise against the combination. This will create new scenarios, broaden the test matrix but not result in significant increase in execution time. I didn't try this because it was not the focus of the experiment;
  • Found some redundant/duplicate test cases - test plans need to be constantly analyzed and maintained you know;
  • Automated scheduling and tools integration is critical. This needs to be working perfectly in order to capitalize on the newly freed resources;
  • Testing on s390x was sub-optimal (mostly my own inexperience with the platform) so for specialized environments we still want to keep the experts around;
  • 1 engineer (me) was able to largely keep up with schedule with the rest of the team!
    • experiment was conducted during the course of several months
    • I have tried to adhere to all milestones and deadlines and mostly succeeded

I have also discovered ideas for new test execution optimization techniques which need to be evaluated and measured further:

  • Use a common set-up step for multiple test cases across variants, e.g.
    • install a RAID system then;
    • perform 3 rescue mode tests (same test case, different variants)
  • Pipeline test cases so that the result of one case is the setup for the next, e.g.
    • install a RAID system and test for correctness of the installation;
    • perform rescue mode test;
    • damage one of the RAID partitions while still in rescue mode;
    • test installation with damaged disks - it should not crash!

These techniques can be used stand-alone or in combination with other optimization techniques and tooling available to the team. They are specific to my particular kind of testing so beware of your surroundings before you try them out!

Thanks for reading and happy testing!

Cover image copyright: cio-today.com

There are comments.

Mutation Testing vs. Coverage, Pt. 2

In a previous post I have shown an example of real world bugs which we were not able to detect despite having 100% mutation and test coverage. I am going to show you another example here.

This example comes from one of my training courses. The task is to write a class which represents a bank account with methods to deposit, withdraw and transfer money. The solution looks like this

bank.py
class BankAccount(object):
    def __init__(self, name, balance):
        self.name = name
        self._balance = balance
        self._history = []

    def deposit(self, amount):
        if amount <= 0:
            raise ValueError('Deposit amount must be positive!')

        self._balance += amount

    def withdraw(self, amount):
        if amount <= 0:
            raise ValueError('Withdraw amount must be positive!')

        if amount <= self._balance:
            self._balance -= amount
            return True
        else:
            self._history.append("Withdraw for %d failed" % amount)

        return False

    def transfer_to(self, other_account, how_much):
        self.withdraw(how_much)
        other_account.deposit(how_much)

Notice that if withdrawal is not possible then the function returns False. The tests look like this

test.py
import unittest
from solution import BankAccount


class TestBankAccount(unittest.TestCase):
    def setUp(self):
        self.account = BankAccount("Rado", 0)

    def test_deposit_positive_amount(self):
        self.account.deposit(1)
        self.assertEqual(self.account._balance, 1)

    def test_deposit_negative_amount(self):
        with self.assertRaises(ValueError):
            self.account.deposit(-100)

    def test_deposit_zero_amount(self):
        with self.assertRaises(ValueError):
            self.account.deposit(0)

    def test_withdraw_positive_amount(self):
        self.account.deposit(100)

        result = self.account.withdraw(1)
        self.assertTrue(result)
        self.assertEqual(self.account._balance, 99)

    def test_withdraw_maximum_amount(self):
        self.account.deposit(100)

        result = self.account.withdraw(100)
        self.assertTrue(result)
        self.assertEqual(self.account._balance, 0)

    def test_withdraw_from_empty_account(self):
        result = self.account.withdraw(50)

        self.assertIsNotNone(result)
        self.assertFalse(result)
        assert "Withdraw for 50 failed" in self.account._history

    def test_withdraw_non_positive_amount(self):
        with self.assertRaises(ValueError):
            self.account.withdraw(0)

        with self.assertRaises(ValueError):
            self.account.withdraw(-1)

    def test_transfer_negative_amount(self):
        account_1 = BankAccount('For testing', 100)
        account_2 = BankAccount('In dollars', 10)

        with self.assertRaises(ValueError):
            account_1.transfer_to(account_2, -50)

        self.assertEqual(account_1._balance, 100)
        self.assertEqual(account_2._balance, 10)


    def test_transfer_positive_amount(self):
        account_1 = BankAccount('For testing', 100)
        account_2 = BankAccount('In dollars', 10)

        account_1.transfer_to(account_2, 50)

        self.assertEqual(account_1._balance, 50)
        self.assertEqual(account_2._balance, 60)


if __name__ == '__main__':
    unittest.main()

Try the following commands to verify that you have 100% coverage

coverage run test.py
coverage report

cosmic-ray run --test-runner nose --baseline 10 example.json bank.py -- test.py`
cosmic-ray report example.json

Can you tell where the bug is ? How about I try to transfer more money than is available from one account to the other

def test_transfer_more_than_available_balance(self):
    account_1 = BankAccount('For testing', 100)
    account_2 = BankAccount('In dollars', 10)

    # transfer more than available
    account_1.transfer_to(account_2, 150)

    self.assertEqual(account_1._balance, 100)
    self.assertEqual(account_2._balance, 10)

If you execute the above test it will fail

FAIL: test_transfer_more_than_available_balance (__main__.TestBankAccount)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test.py", line 79, in test_transfer_more_than_available_balance
    self.assertEqual(account_2._balance, 10)
AssertionError: 160 != 10

----------------------------------------------------------------------

The problem is that when self.withdraw(how_much) fails transfer_to() ignores the result and tries to deposit the money into the other account! A better implementation would be

def transfer_to(self, other_account, how_much):
    if self.withdraw(how_much):
        other_account.deposit(how_much)
    else:
        raise Exception('Transfer failed!')

In my earlier article the bugs were caused by external environment and tools/metrics like code coverage and mutation score are not affected by it. In fact the jinja-ab example falls into the category of data coverage testing.

The current example on the other hand is ignoring the return value of the withdraw() function and that's why it fails when we add the appropriate test.

NOTE: some mutation test tools support the removing/modifying return value mutation. Cosmic Ray doesn't support this at the moment (I should add it). Even if it did that would not help us find the bug because we would kill the mutation using the test_withdraw...() test methods, which already assert on the return value!

Thanks for reading and happy testing!

There are comments.

Circo loco 2017

HackConf 2016

Due to popular demand I'm sharing my plans for the upcoming conference season. Here is a list of events I plan to visit and speak at (hopefully). The list will be updated throughout the year so please subscribe to the comments section to receive a notification when that happens! I'm open to meeting new people so ping me for a beer if you are attending some of these events!

02 February - Hack Belgium Pre-Event Workshops, Brussels

Note: added on Jan 12th

Last year I had an amazing time visiting an Elixir & Erlang workshop so I'm about to repeat the experience. I will be visiting a workshop organized by HackBelgium and keep you posted with the results.

03 February - Git Merge, Brussels

Note: added on Jan 12th

Git Merge is organized by GitHub and will be held in Brussels this year. I will be visiting only the conference track and hopefully giving a lightning talk titled Automatic upstream dependency testing with GitHub API! That and the afterparty of course!

04-05 February - FOSDEM, Brussels

FOSDEM is the largest free and open source gathering in Europe which I have been visiting since 2009 (IIRC). You can checkout some of my reports about FOSDEM 2014, Day 1, FOSDEM 2014, Day 2 and FOSDEM 2016.

I will present my Mutants, tests and zombies talk at the Testing & Automation devroom on Sunday.

UPDATE: Video recording is available here

I will be in Brussels between February 1st and 5th to explore the local start-up scene and get to meet with the Python community so ping me if you are around.

18 March - QA Challenge Accepted 3.0, Sofia

QA: Challenge Accepted is a specialized QA conference in Sofia and most of the sessions are in Bulgarian. I've visited last year and it was great. I even proposed a challenge of my own.

CFP is still open but I have a strong confidence that my talk Testing Red Hat Enterprise Linux the MicroSoft way will be approved. It will describe a large scale experiment with pairwise testing, which btw I learned about at QA: Challenge Accepted 2.0 :).

UPDATE: I will be also on the jury for QA of the year award.

UPDATE: I did a lightning talk about my test case management system (in Bulgarian).

07-08 April - Bulgaria Web Summit, Sofia

I have been a moderator at Bulgaria Web Summit for several years and this year is no exception. This is one of the strongest events held in Sofia and is in English. Last year over 60% of the attendees were from abroad so you are welcome!

I'm not going to speak at this event but will record as much of it as possible. UPDATE: Checkout the recordings on my YouTube channel!

10-12 May - Romanian Testing Conference, Cluj-Napoca

RTC'17 is a new event I found in neighboring Romania. The topic this year is Thriving and remaining relevant in Quality Assurance. My talk is titled Quality Assistance in the Brave New World where I'll share some experiences and visions for the QA profession if that gets accepted.

As it turned out I know a few people living in Cluj so I'll be arriving one day earlier on May 9th to meet the locals.

13-14 May - OSCAL, Tirana

Open Source Conference Albania is the largest OSS event in the country. I'm on a row here to explore the IT scene on the Balkans. Due to traveling constraints my availability will be limited to the conference venue only but I've booked a hotel across the street :).

I will be meeting a few friends in Tirana and hear about the progress of an psychological experiment we devised with Jona Azizaj and Suela Palushi.

Talking wise I'm hoping to get the chance of introducing mutation testing and even host a workshop on the topic.

UPDATE: Here is the video recording from OSCAL. The quality is very poor though.

20-21 May - DEVit, Thessaloniki

This is the 3rd edition of DEVit, the 360° web development conference of Northern Greece. I've been a regular visitor since the beginning and this year I've proposed a session on mutation testing. Because the Thessaloniki community seems more interested in Ruby and Rails my goal is to share more examples from my Ruby work and compare how that is different from the Python world. There is once again an opportunity for a workshop.

So far I've been the only Bulgarian to visit DEVit and also locally known as "The guy who Kosta & Kosta met in Sofia"! Checkout my impressions from DEVit'15 and DEVit'16 if you are still wondering whether to attend or not! I strongly recommend it!

UPDATE: I am still the only Bulgarian visiting DEVit!

22 May - Dev.bg, Sofia

I've hosted a session titled Quality Assurance According to Einstein for the local QA community in Sofia. Video (in Bulgarian) and links are available!

UPDATE: added post-mortem.

01-02 June - Shift Developer Conference, Split

Shift appears to be a very big event in Croatia. My attendance is still unconfirmed due to lots of traveling before that and the general trouble of efficiently traveling on the Balkans. However I have a CFP submitted and waiting for approval. This time it is my Mutation Testing in Patterns, which is a journal of different code patterns found during mutation testing. I have not yet presented it to the public but will blog about it sometime soon so stay tuned.

UPDATE: this one is a no-go!

03-04 June - TuxCon, Plovdiv

TuxCon is held in Plovdiv around the beginning of July. I'm usually presenting some lightning talks there and use the opportunity to meet with friends and peers outside Sofia and catch up with news from the local community. The conference is in Bulgarian with the exception of the occasional foreign speaker. If you understand Bulgarian I recommend the story of Puldin - a Bulgarian computer from the 80s.

UPDATE: I will be opening the conference with QA According to Einstein

17-18 June - How Camp, Varna

How Camp is the little brother of Bulgaria Web Summit and is always held outside of Sofia. This year it will be in Varna, Bulgaria. I will be there of course and depending on the crowd may talk about some software testing patterns.

UPDATE: it looks like this is also a no-go but stay tuned for the upcoming Macedonia Web Summit and Albania Web Summit where yours trully will probably be a moderator!

25-29 September - SuSE CON, Prague

Yeah, this is the conference organized by SuSE. I'm definitely not afraid to visit the competition. I even hope I could teach them something. More details are still TBA because this event is very close to/overlapping with the next two.

28-29 September - SEETEST, Sofia

South East European Software Testing Conference is, AFAIK, an international event which is hosted in a major city on the Balkans. Last year it was held in Bucharest with previous years held in Sofia.

In my view this is the most formal event, especially related to software testing, I'm about to visit. Nevertheless I like to hear about new ideas and some research in the field of QA so this is a good opportunity.

UPDATE: I have submitted a new talk titled If the facts don't fit the theory, change the facts!

30 September-1 October - HackConf, Sofia

HackConf is one of the largest conferences in Bulgaria, gathering over 1000 people each year. I am strongly affiliated with the people who organize it and even had the opportunity to host the opening session last year. The picture above is from this event.

The audience is still very young and inexperienced but the presenters are above average. Organizers' goal is HackConf to become the strongest technical conference in the country and also serve as a sort of inspiration for young IT professionals.

UPDATE: I have submitted both a talk proposal and a workshop proposal. For the workshop I intend to teach children Python and Selenium Automation in 8 hours. I've also been helping the organizers with bringing some very cool speakers from abroad!

October - IT Weekend, Bulgaria

IT Weekend is organized by Petar Sabev, the same person who's behind QA: Challenge Accepted. It is a non-formal gathering of engineers with the intent to share some news and then discuss and share problems and experiences. Checkout my review of IT Weekend #1 and IT Weekend #3.

The topic revolve around QA, leadership and management but the format is open and the intention is to broaden the topics covered. The event is held outside Sofia at a SPA hotel and makes for a very nice retreat. I don't have a topic but I'm definitely going if time allows. I will probably make something up if we have QA slots available :).

October - Software Freedom Kosova, Prishtina

Software Freedom Kosova is one of the oldest conferences about free and open source software in the region. This is part of my goal to explore the IT communities on the Balkans. Kosovo sounds a bit strange to visit but I did recognize a few names on the speaker list of previous years.

The CFP is not open yet but I'm planning to make a presentation. Also if weather allows I'm planning a road trip on my motorbike :).

UPDATE: I've met with some of the FLOSSK members in Tirana at OSCAL and they seem to be more busy with running the hacker space in Prishtina so the conference is nearly a no-go.

14-15 November - Google Test Automation Conference, London

GTAC 2017 will be held in London. Both speakers and attendees are pre-approved and my goal is to be the first Red Hatter and second Bulgarian to speak at GTAC.

The previous two years saw talks about mutation testing vs. coverage and their respective use to determine the quality of a test suite with both parties arguing against each other. Since I'm working in both of these fields and have at least two practical example I'm trying to gather more information and present my findings to the world.

UPDATE: I've also submitted my Testing Red Hat Enterprise Linux the Microsoft way

16-17 November - ISTA Con, Sofia

Innovations in Software Technologies and Automation started as a QA conference several years ago but it has broaden the range of acceptable topics to include development, devops and agile. I was at the first two editions and then didn't attend for a while until last year when I really liked it. The event is entirely in English with lots of foreign speakers.

Recently I've been working on something I call "Regression Test Monitoring" and my intention is to present this at ISTA 2017 so stay tuned.

UPDATE: I didn't manage to collect enough information on the Regression Test Monitoring topic but have made two other proposals.

Thanks for reading and see you around!

There are comments.

QA & Automation 101 Retrospective

QA 101 graduation

At the beginning of this year I've hosted the first QA related course at HackBulgaria. This is a long overdue post about how the course went, what worked well and what didn't. Brace yourself because it is going to be a long one.

The idea behind a QA course has been lurking in both RadoRado's (from HackBulgaria) and my heads for quite a while. We've been discussing it at least a year before we actually started. One day Rado told me he'd found a sponsor and we have the go ahead for the course and that's how it all started!

The first issue was that we weren't prepared to start at a moments notice. I literally had two weeks to prepare the curriculum and initial interview questions. Next we opened the application form and left it open until the last possible moment. I've been reviewing candidate answers hours before the course started, which was another mistake we made!

On the positive side is that I hosted a Q&A session on YouTube answering general questions about the profession and the course itself. This live stream helped popularize the course.

At the start we had 30 people and around 13 of them managed to "graduate" till the final lesson. The biggest portion of students dropped out after the first 5 lessons of Java crash course! Each lesson was around 4 hours with 20-30 minutes break in the middle.

With respect to the criteria find a first job or find a new/better job I consider the training successful. To my knowledge all students have found better jobs, many of them as software testers!

On the practical side of things students managed to find and report 11 interesting bugs against Fedora. Mind you that these were all found in the wild: fedora-infrastructure #5323, RHBZ#1339701, RHBZ#1339709, RHBZ#1339713, RHBZ#1339719, RHBZ#1339731, RHBZ#1339739, RHBZ#1339742, RHBZ#1339746, RHBZ#1340541, RHBZ#1340891.

Then students also made a few pull requests on GitHub (3 which I know off): commons-math #38, commons-csv #12, commons-email #1.

Lesson format

For reference most lessons were a mix of short presentation about theory and best practices followed by discussions and where appropriate practical sessions with technology or projects. The exercises were designed for individual work, work in pairs or small groups (4-5) on purpose.

By request from the sponsors I've tried to keep a detailed record of each student's performance and personality traits as much as I was able to observe them. I really enjoyed keeping such a journal but didn't share this info with my students which I consider a negative issue. I think knowing where your strong and weak areas are would help you become a better expert in your field!

Feedback from students

  • There was little time (to work on the practical examples I guess);
  • Not having particular practical tasks was a problem. For example we didn't have tasks of the sort do "X" then "Y";
  • They needed more time and more attention to be given to them;
  • Installing different pieces of software and tools took a lot of time and frustration. It was also quite problematic sometimes depending on whether they used Linux, Windows or Mac OS X;
  • Working with Eclipse IDE was horrible. Nobody new what to do and the interface wasn't newbie friendly. Also it took quite a lot of time to install dependencies and/or import projects to run in Eclipse;
  • There were a few problems with Selenium and its different versions being used.

I have to point out that while these are valid concerns and major issues students were at least partially guilty for the last 3 of them. It was my impression that most of them didn't prepare at home, didn't read the next lesson and didn't install prerequisite tools and software!

5x Java Crash Course

We've started with a Java crash course as requested by our sponsors which was extended to 5 instead of the original 3 sessions. RadoRado was teaching Java fundamentals while I was assisting him with comments.

On the good side is that Rado explains very well and in much details. He also writes code to demonstrate what he teaches and while doing so uses only the knowledge he's presented so far. For example if there's a repeating logic/functionality he would just write it twice instead of refactoring that into a separate function with parameters (assuming the students have not learned about functions yet). I think this made it more easier to understand the concepts being taught.

Another positive thing we did was me going behind Rado's computer and modifying some of the code while he was explaining something on screen. If you take the above example and have two methods with print out salutations, e.g. "Good morning, Alex" I would go and modify one of them to include "Mr." while the other will not. This introduced a change in behavior which ultimately results in a bug! This was a nice practical way to demonstrate how some classes of bugs get introduced in reality. We did only a few of these behind the computer changes and I definitely liked them! They were all ad-hoc, not planned for.

On the negative side Java seems hard to learn and after these 5 lessons half of the students dropped out. Maybe part of the reason is they didn't expect to start a QA course with lessons about programming. But that also means they didn't pay enough attention to the curriculum, which was announced in advance!

Lesson 01 - QA Fundamentals

I had made a point to assign time constraints to each exercise in the lessons. While that mostly worked in the first few lessons, where there is more theory, we didn't keep the schedule and were overtime.

Explaining testing theory (based on ISTQB fundamentals) took longer than I expected. it was also evident that we needed more written examples of what different test analysis techniques are (e.g. boundary value analysis). Here Petar Sabev helped me deliver few very nice examples.

One of the exercises was "when to stop testing" with an example of a Sudoku solving function and different environments in which this code operates, e.g. browser, mobile, etc. Students appeared to have a hard time understanding what a "runtime environment" is and define relevant tests based on that! I believe most of the students, due to lack of knowledge and experience, were also having a hard time grasping the concept of non-functional testing.

A positive thing was that students started explaining to one another and giving examples for bugs they've seen outside the course.

Lesson 02 - Software Development Lifecycle

This lesson was designed as role playing game to demonstrate the most common software development methodologies - waterfall and agile and discuss the QA role in both of them. The format by itself is very hard to conduct successfully and this was my first time ever doing this. I've also never taken part of such games until then, only heard about them.

During the waterfall exercise it was harder for the students to follow the game constraints and not exchange information with one another because they were sitting on the same table.

On the positive side all groups came with unique ideas about software features and how they want to develop them. Timewise we managed to do very well. On the negative side is that I was the client for all groups and didn't manage to pay enough attention to everyone, which btw is what clients usually do in real life.

Lesson 03 - Bug Tracking

This lesson was a practical exercise in writing bug reports and figuring out what information needs to be present in a good bug report. Btw this is something I always ask junior members at job interviews.

First we started with working in pairs to define what a good bug report is without actually knowing what that means. Students found it hard to brainstorm together and most of them worked alone during this exercise.

Next students had to write bug reports for some example bugs, which I've explained briefly on purpose and perform peer reviews of their bugs. Reviews took a long time to complete but overall students had a good idea of what information to include in a bug report.

Then, after learning from their mistakes and hearing what others had done, they've learned about some good practices and were tasked to rewrite their bug reports using the new knowledge. I really like the approach of letting students make some mistakes and then showing them the easier/better way of doing things. This is also on-par with Ivan Nemytchenko's methodology of letting his interns learn by their mistakes.

All bug reports can be found in students repositories, which are forked from the curriculum. Check out https://github.com/HackBulgaria/QA-and-Automation-101/network.

I should have really asked everyone to file bugs under the curriculum repository so it is easier for me to track them. On the other hand I wanted each student to start building their own public profile to show potential employers.

Lesson 04 - Test Case Management

This lesson started with an exercise asking students to create accounts for Red Hat's OpenShift cloud platform in the form of a test scenario. The scenario intentionally left out some details. The idea being that missing information introduces inconsistencies during testing and to demonstrate how the same steps were performed slightly differently.

We had some troubles explaining exactly "how did you test" because most inexperienced people would not remember all details about where they clicked, did they use the mouse or the keyboard, was the tab order correct, etc. Regardless students managed to receive different results and discover that some email providers were not supported.

The homework assignment was to create test plans and test cases in Nitrate at https://nitrate-hackbg.rhcloud.com/. Unfortunately the system appears to be down ATM and I don't have time to investigate why. This piece of infrastructure was put together in 2 hours and I'm surprised it lasted without issues during the entire course.

2x Introduction to Linux

This was a crash course in Linux fundamentals and exercise with most common commands and text editors in the console. Most of the students were not prepared with virtual machines. We've also used a cloud provider to give students remote shell but the provider API was failing and we had to deploy docker containers manually. Overall infrastructure was a big problem but we somehow managed. Another problem was with ssh clients from Windows who generated keys in a format that our cloud provider couldn't understand.

Wrt commands and exercises students did well and managed to execute most of them on their own. That's very good for people who've never seen a terminal in their lives (more or less).

Lesson 05 - Testing Fedora 24(25) Changes

Once again nobody was prepared with a virtual machine with Fedora and students were installing software as we go. Because of that we didn't manage to conduct the lesson properly and had to repeat it on the next session.

Rawhide being the bleeding edge of Fedora means it is full of bugs. Well I couldn't keep up with everyone and explain workarounds or how to install/upgrade Fedora. That was a major setback. It also became evident that you can't move quickly if you have no idea what to do and no instructions about it either.

Lesson 05 - Again

Once prepared with the latest and greatest from Rawhide the task was to analyze the proposed feature changes (on the Fedora wiki) and create test plans and design test cases for said changes. Then execute the tests in search for bugs. This is where some of the bugs above came from. The rest were found during upgrades.

This lesson was team work (4-5 students) but the results were mixed. IMO Fedora changes are quite hard to grasp, especially if you lack domain knowledge and broader knowledge about the structure and operation of a Linux distribution. I don't think most teams were able to clearly understand their chosen features and successfully create good plans/scenarios for them. On the other hand in real life software you don't necessarily understand the domain better and know what to do. I've been in situations where whole features have been defined by a single sentence and requested to be tested by QA.

One of the teams didn't manage to install Fedora (IIRC they didn't have laptops capable of running a VM) and were not able to conduct the exercise.

Being able to find real life bugs, some of them serious, and getting traction in Bugzilla is the most positive effect of this lesson. I personally wanted to have more output (e.g. more bugs, more cases defined, etc) but taking into account the blocking factors and setbacks I think this is a good initial result.

Lesson 06 - Unit Testing and Continuous Integration

Here we had a few examples of bad stubs and mocks which were not received very well. The topic is hard in itself and wasn't very well explained with practical examples.

Another negative thing is that students took a lot of time to fiddle around with Eclipse, they were mistyping commands in the terminal and generally not paying enough attention to instructions. This caused the exercises to go slowly.

We've had an exercise which asks the student to write a new test for a non-existing method. Then implement the method and make sure all the tests passed. You guessed it this is Test Driven Development. IIRC one of the students was having a hard time with that exercise so I popped up my editor on the large screen and started typing what she told me, then re-running the tests and asking her to show me the errors I've made and tell me how to correct them. The exercise was received very well and was fun to do.

Due to lack of time we had to go over TravisCI very quickly. The other bad thing about TravisCI is that it requires git/GitHub and the students were generally inexperienced with that. Both GitHub for Windows and Mac OS suck a big time IMO. What you need is the console. However none of the students had any practical experience with git and knew how to commit code and push branches to GitHub. git fundamentals however is a separate one or two lessons by itself which we didn't do.

Lesson 07 - Writing JUnit tests for Apache Commons

Excluding the problems with Eclipse and the GitHub desktop client and missing instructions for Windows the hardest part of this lesson was actually selecting a component to work on, understanding what the code does and actually writing meaningful tests. On top of that most students were not very proficient programmers and Java was completely new to them.

Despite having 3 pull requests on GitHub I consider this lesson to be a failure.

Lesson 08 - Integration Testing with Selenium

This lesson starts with an example of what a flaky test it. At the moment I don't think this lesson is the best place for that example. To make things even more difficult the example is in Python (because that way was the easiest for me to write it) instead of Java. Students had problems installing Python on Windows just to make this example work. They also were lacking the knowledge how to execute a script in the terminal.

One of the students proposed a better flaky example utilizing dates and times and executing it during various hours of the day. I have yet to code this and prepare environment in which it would be executed. Btw recently I've seen similar behavior caused by inconsistent timezone usage in Ruby which resulted in unexpected time offset a little after midnight :).

Once again I have to point out that students came generally unprepared for the lesson and haven't installed prerequisite software and programming languages. This is becoming a trend and needs to be split out into a preparation session, possibly with a check list.

On the Selenium side, starting with Selenium IDE, it was a bit unclear how to use it and what needs to be done. This is another negative trend, where students were missing clear instructions what they are expected to do. At the end we did resort to live demo using Selenium IDE so they can at least get some idea about it.

Lesson 09 and 10 - Writing Selenium tests for Mozilla Add-ons website

IMO these two lessons are the biggest disaster of the entire course. Python & virtualenv on Windows was a total no go but on Linux things weren't much easier because students had no idea what a virtualenv is.

Practice wise they haven't managed to read all the bugs on the Mozilla bug tracker and had a very hard time selecting bugs to write tests for. Not to mention that many of the reported bugs were administrative tasks to create or remove add-on categories. There weren't many functional related bugs to write tests for.

The product under test was also hard to understand and most students were seeing it for the first time, let alone getting to know the devel and testing environments that Mozilla provides. Mozilla's test suite being in Python is just another issue to make contribution harder because we've never actually studied Python.

Between the two lessons there were students who've missed the Selenium introduction lesson and were having even harder time to figure things out. I didn't have the time to explain and go back to the previous lesson for them. Maybe an attendance policy is needed for dependent lessons.

Before the course started I've talked to some guys at Mozilla's IRC channel and they agreed to help but at the end we didn't even engage with them. At this point I'm skeptical that mentoring over IRC would have worked anyway.

Lesson 11 - Introduction to Performance Testing

This was a more theoretical lesson with less practical examples and exercises. I have provided some blog posts of mine related to the topic of performance testing but in general they are related to software that I've used which isn't generally known to a less experienced audience (Celery, Twisted). These blog posts IMO were hard to put into perspective and didn't serve a good purpose as examples.

The practical part of the lesson was a discussion with the goal of creating a performance testing strategy for GitHub's infrastructure. It was me who was doing most of the talking because students have no experience working on such a large infrastructure like GitHub and didn't know what components might be there, how they might be organized (load balancers, fail overs, etc) and what needs to be tested.

There was also a more practical example to create a performance test in Java for one of the classes found in commons-codec/src/main/java/org/apache/commons/codec/digest. Again the main difficulty here was working fluently with Eclipse, getting the projects to build/run and knowing how the software under test was supposed to work and be executed.

Lesson 12 - How to find 1000 bugs in 30 minutes

This was a more relaxed lesson with examples of simple types of bugs found on a large scale. Most examples came from my blog and experiments I've made against Fedora.

While amusing and fun I don't think all of the students understood me and kept their attention. Part of that is because Fedora tends to focus on low level stuff and my examples were not necessarily easy to understand.

Feedback from the sponsor

Experian Bulgaria was the exclusive sponsor for this course. At the end of the summer Rado and I met with them to discuss the results of the training. Here's what they say

  • Overall technical knowledge they consider to be weak. What they need are people with basic knowledge in the field of object oriented programming, databases, operating systems and networking. These are skills candidates need in order to work for Experian. It must be noted that most of them were not taught at this course!
  • English language proficiency for all students is low on average;
  • User level experience with Linux was fine but students were missing deeper knowledge about the operating system. Once again something we didn't teach;
  • The programming languages favored at Experian are Java and Ruby and students had poor knowledge of them. They also had weak understanding of OOP principles;
  • According to Experian students were more ingrained with the development mindset and were trying their luck in the QA profession instead of genuinely being interested in the field. While this is generally the case I have to point out that as sponsor Experian did a poor job at promoting their company and the QA profession as a whole. What isn't known to the general public is their big in-house QA community which could have served as a source of inspiration for the students!
  • Low motivation and lack of fundamental knowledge from university are other traits interviewers at Experian have identified. They argued for a stronger acceptance process and a requirement of minimum 2 years of university education in computer science.

On the topic of testing knowledge candidates did mostly OK, however we don't have enough information about this. Also the hiring process at Experian is more focused on the broader knowledge areas listed above so substantial improvement in the testing knowledge of candidates doesn't given them much head start.

While to my knowledge they didn't hire anyone few people received an offer but declined due to various personal reasons. I view this as poor performance on our side but Experian thinks otherwise and are willing to sponsor another round of training.

Summary

Here is a list of all the things that could be improved

  • Take time to develop the curriculum and have it pass QA review by other experienced testers (in particular such that also teach students);
  • Make the application process harder to include people with broader IT knowledge;
  • Allow time to review all applicants;
  • Find a co-trainer and additional mentors for some of the exercises;
  • Minimize the number of students accepted in the course so we can handle everyone with more care;
  • Give homework assignments and examine them before each lesson;
  • Collect and provide performance review at the end of the course;
  • Minimize the technology stack and tools used;
  • Host a technology preparation session at the beginning paired with a check list of all the things that need to be done. Very likely make the chosen technology stack a hard requirement to avoid setbacks later in the schedule;
  • We need more fundamental lessons like practical git tutorial, practical Linux tutorial, databases, networking, etc.
  • Same goes for Java or any other programming language. Each of these technologies easily makes a course by itself. Possibly include technology skill assessment in the application form and reject candidates who don't meet the minimum level. This is to ensure the group is able to move at the same pace.
  • We need to improve the infrastructure used during the course, especially exercise bug tracking and test case management systems;
  • Students need clear instructions for every exercise - what is required from them, what the expected result is and what they need to be doing, etc;
  • Provide more examples for just about everything. Also make the examples easier to understand/simple with the harder examples left for further reading;
  • Provide easier projects to work on. This means applications whose domain is easier to understand and closer to what experiences the students might have. Also projects need clear instructions how to join and how to contribute with tests.
  • The same application/software needs to be used for manual bug finding, unit test writing and integration test with Selenium. This will minimize both context and technology switches and allow to view various testing activities in the context of a single software under test;
  • While using open source projects for the above listed purposes sounds great the reality is that they are hard to join and will not move according to our schedule. Maybe less focus on open source per-se and more focus on a particular application under test. Instructors can then proxy all the tests forward to the upstream community;
  • Focus more on practice and less on exotic topics like performance testing and large scale bug finding;
  • Seek more active participation from sponsors!

If you have suggestions please comment below, especially if you can tell me how to implement them in practice.

Thanks for reading and happy testing!

There are comments.


Page 1 / 6