During the past 3 weeks I've been debugging a weird error which started happening after I migrated KiwiTestPad to Django 1.10.7. Here is the reason why this happened.
After migrating to Django 1.10 all tests appeared to be working locally on SQLite however they failed on MySQL with
TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
The exact same test cases failed on PostgreSQL with:
InterfaceError: connection already closed
Since version 1.10 Django executes all tests inside transactions so my first thoughts were related to the auto-commit mode. However upon closer inspection we can see that the line which triggers the failure is
self.assertTrue(users.exists())
which is essentially a SELECT
query aka
User.objects.filter(username=username).exists()
!
My tests were failing on a SELECT query!
Reading the numerous posts about TransactionManagementError
I discovered it may
be caused by a run-away cursor. The application did use raw SQL statements which
I've converted promptly to ORM queries, that took me some time. Then I also fixed
a couple of places where it used transaction.atomic()
as well. No luck!
Then, after numerous experiments and tons of logging inside Django's own code I was able to figure out when the failure occurred and what events were in place. The test code looked like this:
response = self.client.get('/confirm/')
user = User.objects.get(username=self.new_user.username)
self.assertTrue(user.is_active)
The failure was happening after the view had been rendered upon the first time I do a SELECT against the database!
The problem was that the connection to the database had been closed midway during the transaction!
In particular (after more debugging of course) the sequence of events was:
django/test/client.py::Client::get()
django/test/client.py::ClientHandler::__call__()
, which takes
care to disconnect/connect signals.request_started
and signals.request_finished
which are responsible for tearing down the DB connection, so problem not heredjango/core/handlers/base.py::BaseHandler::get_response()
django/core/handlers/base.py::BaseHandler::_get_response()
which goes through
the middleware (needless to say I did inspect all of it as well since there
have been some changes in Django 1.10)response = wrapped_callback()
while still inside BaseHandler._get_response()
execute django/http/response.py::HttpResponseBase::close()
which looks like
# These methods partially implement the file-like object interface.
# See https://docs.python.org/3/library/io.html#io.IOBase
# The WSGI server must call this method upon completion of the request.
# See http://blog.dscpl.com.au/2012/10/obligations-for-calling-close-on.html
def close(self):
for closable in self._closable_objects:
try:
closable.close()
except Exception:
pass
self.closed = True
signals.request_finished.send(sender=self._handler_class)
signals.request_finished
is fired
django/db/__init__.py::close_old_connections()
closes the connection!IMPORTANT: On MySQL setting AUTO_COMMIT=False
and CONN_MAX_AGE=None
helps
workaround this problem but is not the solution for me because it didn't help on
PostgreSQL.
Going back to HttpResponseBase::close()
I started wondering who calls this method.
The answer was it was getting called by the @content.setter
method at
django/http/response.py::HttpResponse::content()
which is even more weird because
we assign to self.content
inside HttpResponse::__init__()
The root cause of my problem was precisely this HttpResponse::__init__()
method
or rather the way we arrive at it inside the application.
The offending view last line was
return HttpResponse(Prompt.render(
request=request,
info_type=Prompt.Info,
info=msg,
next=request.GET.get('next', reverse('core-views-index'))
))
and the Prompt class looks like this
from django.shortcuts import render
class Prompt(object):
@classmethod
def render(cls, request, info_type=None, info=None, next=None):
return render(request, 'prompt.html', {
'type': info_type,
'info': info,
'next': next
})
Looking back at the internals of HttpResponse
we see that
self.make_bytes()
HttpResponse
itself is an iterator, inherits from six.Iterator
so when we initialize
HttpResponse
with another HttpResponse
object (aka the content) we execute content.close()
which unfortunately happens to close the database connection as well.
IMPORTANT: note that from the point of view of a person using the application the
HTML content is exactly the same regardless of whether we have nested HttpResponse
objects
or not.
Also during normal execution the code doesn't run inside a transaction so we never notice
the problem in production.
The fix of course is very simple, just return Prompt.render()
!
Thanks for reading and happy testing!
There are comments.
Recently I've started testing a Haskell application and a question I find unanswered (or at least very poorly documented) is how to produce coverage reports for binaries ?
hpc
is the Haskell code coverage tool. It produces the following files:
hpc trans
.The invocation to hpc report
needs to know where to find the .mix files in order
to be able to translate the coverage information back to source and it needs to
know the location (full path or relative from pwd) to the tix file we want to
report.
cabal
is the package management tool for Haskell. Among other thing it can be used
to build your code, execute the test suite and produce the coverage report for you.
cabal build
will produce module information in dist/hpc/vanilla/mix
and
cabal test
will store coverage information in dist/hpc/vanilla/tix
!
A particular thing about Haskell is that you can only test code which can be
import
ed, e.g. it is a library module. You can't test (via Hspec or Hunit) code which
lives inside a file that produces a binary (e.g. Main.hs). However you can still
execute these binaries (e.g. invoke them from the shell) and they will produce a
coverage report in the current directory (e.g. main.tix).
cabal build
and cabal test
build the project and execute your unit tests.
This will create the necessary .mix files (including ones for binaries) and .tix
files coming from unit testing;binary.tix
file under dist/hpc/vanilla/binary/binary.tix
!Produce coverage report with hpc:
hpc markup --hpcdir=dist/hpc/vanilla/mix/lib --hpcdir=dist/hpc/vanilla/mix/binary dist/hpc/vanilla/tix/binary/binary.tix
Convert the coverage report to JSON and send it to Coveralls.io:
cabal install hpc-coveralls
~/.cabal/bin/hpc-coveralls --display-report tests binary
Check out the haskell-rpm repository
for an example. See job #45 where there is now
coverage for the inspect.hs
, unrpm.hs
and rpm2json.hs
files, producing binary executables.
Also notice that in
RPM/Parse.hs
the function parseRPMC
is now covered, while it was not covered in the
previous job #42!
script:
- ~/.cabal/bin/hlint .
- cabal install --dependencies-only --enable-tests
- cabal configure --enable-tests --enable-coverage --ghc-option=-DTEST
- cabal build
- cabal test --show-details=always
# tests to produce coverage for binaries
- wget https://s3.amazonaws.com/atodorov/rpms/macbook/el7/x86_64/efivar-0.14-1.el7.x86_64.rpm
- ./tests/test_binaries.sh ./efivar-0.14-1.el7.x86_64.rpm
# move .tix files in appropriate directories
- mkdir ./dist/hpc/vanilla/tix/inspect/ ./dist/hpc/vanilla/tix/unrpm/ ./dist/hpc/vanilla/tix/rpm2json/
- mv inspect.tix ./dist/hpc/vanilla/tix/inspect/
- mv rpm2json.tix ./dist/hpc/vanilla/tix/rpm2json/
- mv unrpm.tix ./dist/hpc/vanilla/tix/unrpm/
after_success:
- cabal install hpc-coveralls
- ~/.cabal/bin/hpc-coveralls --display-report tests inspect rpm2json unrpm
Thanks for reading and happy testing!
There are comments.
This is one of the stickers for the second edition of Rails Girls Vratsa which was held yesterday. Let's explore some of the bug proposals submitted by the Bulgarian QA group:
- sad() == true is ugly
- sad() is not very nice, better make it if(isSad())
- use sadStop(), and even better - stopSad()
- there is an extra space character in beAwesome( )
- the last curly bracket needs to be on a new line
Lyudmil Latinov
My friend Lu describes what I would call style issues. The style he refers to
is mostly Java oriented, especially with naming things. In Ruby we would probably
go with sad?
instead of isSad
. Style is important and there are many tools
to help us with that this will not cause a functional problem! While I'm at it let me say
the curly brackets are not the problem either. They are not valid in Ruby this is
a pseudo-code and they also fall in the style category.
The next interesting proposal comes from Tsveta Krasteva. She examines the possibility
of sad()
returning an object or nil instead of boolean value. Her first question was
will the if statement still work, and the answer is yes. In Ruby everything is an object
and every object can be compared to true
and false
. See
Alan Skorkin's blog
post on the subject.
Then Tsveta says the answer is to use sad().stop()
with the warning that it may return
nil. In this context the sad()
method returns on object indicating that the person
is feeling sad. If the method returns nil then the person is feeling OK.
class Csad
def stop()
print("stop\n");
end
end
def sad()
print("sad\n");
Csad.new();
end
def beAwesome()
print("beAwesome\n");
end
# notice == true was removed
if(sad())
print("Yes, I am sad\n");
sad.stop();
beAwesome( );
end
While this is coming closer to a functioning solution something about it is bugging me.
In the if statement the developer has typed more characters than required (== true
).
This sounds to me unlikely but is possible with less experienced developers.
The other issue is that we are using an object (of class Csad
) to represent an internal
state in the system under test. There is one method to return the state (sad()
) and
another one to alter the state (Csad.stop()
). The two methods don't operate on
the same object! Not a very strong OOP design. On top of that we have to call the
method twice, first time in the if statement, the second time in the body of the
if statement, which may have unwanted side effects. It is best to assign the return
value to some variable instead.
IMO if we are to use this OOP approach the code should look something like:
class Person
def sad?()
end
def stopBeingSad()
end
def beAwesome()
end
end
p = Person.new
if p.sad?
p.stopBeingSad
p.beAwesome
end
Let me return back to assuming we don't use classes here.
The first obvious mistake is the space in sad stop();
first spotted by Peter Sabev*.
His proposal, backed by others is to use sad.stop()
. However they
didn't use my hint asking what is the return value of sad()
?
If sad()
returns boolean then we'll get
undefined method 'stop' for true:TrueClass (NoMethodError)
!
Same thing if sad()
returns nil, although we skip the if block in this case.
In Ruby we are allowed to skip parentheses when calling a method, like I've shown
above. If we ignore this fact for a second, then sad?.stop()
will mean execute the
method named stop()
which is a member of the sad?
variable, which is of type method!
Again, methods don't have an attribute named stop
!
The last two paragraphs are the semantic/functional mistake I see in this code. The only way for it to work is to use an OOP variant which is further away from what the existing clues give us.
Note: The variant sad? stop()
is syntactically correct. This means call the function sad?
with parameter the result of calling the method stop()
, which depending on the outer scope of this program may or may not
be correct (e.g. stop
is defined, sad?
accepts optional parameters, sad?
maintains
global state).
Thanks for reading and happy testing!
There are comments.
Design is a method! Design can be taught! Developers can do good design! If this sounds outrageous then I present you Zaharenia Atzitzikaki who is a developer by education, not a graphics designer and she thinks otherwise. This blog post will summarize her workshop held at the DEVit conference last month.
We are going to build a site called DevMatch, which is like Tinder for developers. The initial version doesn't look bad but we can do better:
Layout is grids and the most popular designs use grids with 12, 16 or 24 columns. The idea is to make everything align to the grid which allows the eyes to follow a straight line and makes the content easier to perceive. You don't want to break the story line. Don't fear the white space but don't leave it random.
Make everything align to the grid ... but not so much (checkout this TEDx talk about predictability and variability in music).
Make sure not to use centered alignment, nor justified alignment because they don't provide a single line for the eyes to follow. Align to the left, buttons align at the bottom.
To make an element more prominent (like the recommended plan) then make it double width!
Finally we remove the stock images because they are distracting!
Here's how everything looks now:
The web is 95% typography. Serif fonts are good for reading long passages of text because they allow the eyes to follow. Sans-serif fonts look great on screens, especially for smaller sizes (< 12px). Monospaced fonts are only for code! Script fonts are fun but use them with caution.
The fonts we select need to improve readability, not hinder it. Minimum font size should be 16px or even 18px.
Use a typographic scale which tells you how big certain text should be, e.g. h1 vs h2 vs h3 vs paragraph!
Find a font pair which works (e.g. Oxygen + Source Sans). Also compile a list of fallback fonts, e.g. Futura, Trebuchet MS, Arial, Sans-serif. This makes sure that your fonts work well together and that visitors on your site will use fonts which are as close as possible to what you intended.
Increase line height to improve readability of paragraphs. The minimum is 1.4em. Keep line length short, between 45 and 75 characters.
Layout and Typography are the two most important design steps and you will achieve very good results if you apply only the two of them. Here's how everything looks now:
Find a color palette generator and use it. For new projects start with competitor analysis, a logo or a picture you like or something that conveys a known meaning to the customer. Zaharenia's tips include:
Here's how everything looks now:
Here we talk about icons and images which are to be used only as visual aid, not alone (especially for navigation). The best thing you can do is find a good icon set (with lots of sizes) or even better an SVG set. Then combine several icons together if need be, instead of using stock photos.
It is best to use SVG for all icons because we can use CSS to modify the colors inside the SVG. For example the features icons below are all gray and some SVG paths have been styled with the accent color. Here's how it looks now:
Other tips include
This is about what text we provide on the screen. The rules of thumb are:
And this is the final version of our website (note: the header logo mishap is probably from my side, not intentional):
These are the 5 basic design steps. You don't need to be a trained designer to be able to apply them. Now that you know what the steps are simply search for fonts, scales, color palettes and icon sets and apply them. This is what Zaharenia does (in her own words). You can find all HTML, CSS and images for this workshop at the design4devs-devit repository.
Thanks for reading and happy designing!
There are comments.
How do you test a login form? is one of my favorite questions when screening candidates for QA positions and also a good brain exercise even for experienced testers. I've written about it last year. In this blog post I'd like to share a different perspective on this same question, this time courtesy of my friend Rayna Stankova.
The series of images above is from a Women Who Code Sofia workshop where the participants were given printed copies and asked to find as much defects as possible. Here they are (counting clock-wise from the top-left corner):
Here is a list of possible test scenarios, proposed by Rayna. Notes are mine.
UI Layer
Tests 10 and 11 are particularly relevant for Fedora Account System where you need a really strong password and (at least in the past) had to change it more often and couldn't reuse any of your old passwords. As a user I really hate this b/c I can't remember my own password but it makes for a good test scenario.
13 and 14 are also something I rarely see and could make a nice case for property based testing.
16 would have been the bread and butter of testing Emoj.li (the first emoji-only social network).
Keyboard Specific
These are all so relevant with beautifully styled websites nowadays. The one I hate the most is when space key doesn't trigger select/unselect for checkboxes which are actually images!
Security:
22, 23 and 24 are a bit generic and I guess can be collapsed into one. Better yet make them more specific instead.
Test 28 may sound like nonsense but is not. I remember back in the days that it was possible to copy and paste the password out of Windows dial-up credentials screen. With heavily styled form fields it is possible to have this problem again so it is a valid check IMO.
Others:
I think Test 30 means to validate that the backend doesn't store passwords in plain text but rather stores their hashes.
32 is a duplicate of 16. I also say why only the username? Password field is also a good candidate for this.
If you check how I would test a login form you will find some similarities but there are also scenarios which are different. I'm interested to see what other scenarios we've both missed, especially ones which have manifested themselves as bugs in actual applications.
Thanks for reading and happy testing!
There are comments.
In my last several presentations I briefly talked about using your tests as a monitoring tool. I've not been eating my own dog food and stuff failed in production!
This is a technique I coined 6 months ago while working with Tradeo's team. I'm not the first one to figure this out so if you know the proper name for it please let me know in the comments. So why not take a subset of your automated tests and run them regularly against production? Let's say every hour?
In my particular case we started with integration tests which interact with the product (a web app) in a way that a living person would do. E.g. login, update their settings, follow another user, chat with another user, try to deposit money, etc. The results from these tests are logged into a database and then charted (using Grafana). This way we can bring lots of data points together and easily analyze them.
This technique has the added bonus that we can cover the most critical test paths in a couple of minutes and do so regularly without human intervention. Perusing the existing monitoring infrastructure of the devops team we can configure alerts if need be. This makes it sort of early detection/warning system plus it gives a degree of possibility to spot correlations between data points or patterns.
As simple as it sounds I've heard about a handfull of companies doing this sort of continuous testing against production. Maybe you can implement something similar in your organization and we can talk more about the results?
Anyway, everyone knows how to write Selenium tests so I'm not going to bother you with the details. Why does this kind of testing matter?
Do you remember a recent announcement by GitHub about Travis CI leaking some authentication tokens into their public log files? I did receive an email about this but didn't pay attention to it because I don't use GitHub tokens for anything I do in Travis. However as a safety measure GitHub had went ahead and wiped out my security tokens.
The result from this is that my automated upstream testing infrastructure had stopped working! In particular my requests to the GitHub API stopped working. And I didn't even know about it!
This means that since May 24th there have been at least 4 new versions of libraries and frameworks on which some of my software depends and I failed to test them! One of them was Django 1.11.2.
I have supplied a new GitHub token for my infra but if I had monitoring I would have known about this problem well in advance. Next I'm off to write some monitoring tests and also implement better failure detection in Strazar itself!
Thanks for reading and happy testing (in production)!
There are comments.
While working on a new feature for Pelican I've put myself in a situation where I have two functions, one nested inside the other and I want the nested function to assign to variable from the parent function. Turns out this isn't so easy in Python!
def hello(who):
greeting = 'Hello'
i = 0
def do_print():
if i >= 5:
return
print i, greeting, who
i += 1
do_print()
do_print()
if __name__ == "__main__":
hello('World')
The example above is a recursive Hello World. Notice the i += 1
line!
This line causes i
to be considered local to do_print()
and the result
is that we get the following failure on Python 2.7:
Traceback (most recent call last):
File "./test.py", line 16, in <module>
hello('World')
File "./test.py", line 13, in hello
do_print()
File "./test.py", line 6, in do_print
if i >= 5:
UnboundLocalError: local variable 'i' referenced before assignment
We can workaround by using a global variable like so:
i = 0
def hello(who):
greeting = 'Hello'
def do_print():
global i
if i >= 5:
return
print i, greeting, who
i += 1
do_print()
do_print()
However I prefer not to expose internal state outside the hello()
function. Only if there was a keyword similar to global. In Python 3
there is nonlocal!
def hello(who):
greeting = 'Hello'
i = 0
def do_print():
nonlocal i
if i >= 5:
return
print(i, greeting, who)
i += 1
do_print()
do_print()
nonlocal is nice but it doesn't exist in Python 2! The workaround is to not assign state to the variable itself, but instead use a mutable container. That is instead of a scalar use a list or a dictionary like so:
def hello(who):
greeting = 'Hello'
i = [0]
def do_print():
if i[0] >= 5:
return
print i[0], greeting, who
i[0] += 1
do_print()
do_print()
Thanks for reading and happy coding!
There are comments.
. . 9 | . 2 8 | 7 . .
8 . 6 | . . 4 | . . 5
. . 3 | . . . | . . 4
------+-------+------
6 . . | . . . | . . .
? 2 . | 7 1 3 | 4 5 .
. . . | . . . | . . 2
------+-------+------
3 . . | . . . | 5 . .
9 . . | 4 . . | 8 . 7
. . 1 | 2 5 . | 3 . .
In a comment to a previous post Flavio Poletti proposed a very interesting test case for a function which solves the Sudoku game - semantically invalid input, i.e. an input that passes intermediate validation checks (no duplicates in any row/col/9-square) but that cannot possibly have a solution.
Until then I thought that Sudoku was a completely deterministic game and if input followed all validation checks then we always have a solution. Apparently I was wrong! Reading more on the topic I discovered these Sudoku test cases from Sudopedia. Their Invalid Test Cases section lists several examples of semantically invalid input in Sudoku:
The example above cannot be solved because the left-most square of the middle row (r5c1) has no possible candidates.
Following the rule non-repeating numbers from 1 to 9 in each row for row 5 we're left with numbers: 6, 8 and 9. For (r5c1) 6 is a no-go because it is already present in the same square. Then 9 is a no-go because it is present in column 1. Which leaves us with 8, which is also present in column 1! Pretty awesome, isn't it?
Also check the Valid Test Cases section which includes other interesting examples and definitely not ones which I have considered previously when testing Sudoku.
On a more practical note I have been trying to remember a case from my QA practice where we had input data that matched all conditions but is semantically invalid. I can't remember of such a case. If you do have examples about semantically invalid data in real software please let me know in the comments below!
Thanks for reading and happy testing!
There are comments.
Couple of months ago I conducted a practical, instructor lead training in Python and Selenium automation for manual testers. You can find the materials at GitHub.
The training consists of several basic modules and practical homework assignments. The modules explain
Every module is intended to be taken in the course of 1 week and begins with links to preparatory materials and lots of reading. Then I help the students understand the basics and explain with more examples, often writing code as we go along. At the end there is the homework assignment for which I expect a solution presented by the end of the week so I can comment and code-review it.
All assignments which require the student to implement functionality, not tests, are paired with a test suite, which the student should use to validate their solution.
Despite everything I've written below I had 2 students (from a group of 8) which showed very good progress. One of them was the absolute star, taking active participation in every class and doing almost all homework assignments on time, pretty much without errors. I think she'd had some previous training or experience though. She was in the USA, training was done remotely via Google Hangouts.
The other student was in Sofia, training was done in person. He is not on the
same level as the US student but is the best from the Bulgarian team. IMO he
lacks a little bit of motivation. He "cheated" a bit on some tasks providing
non-standard, easier solutions and made most of his assignments. After the first
Selenium session he started creating small scripts to extract results from
football sites or as helpers to be applied in the daily job.
The interesting
fact for me was that he created his programs as unittest.TestCase
classes.
I guess because this was the way he knew how to run them!?!
There were another few students which had had some prior experience with programming but weren't very active in class so I can't tell how their careers will progress. If they put some more effort into it I'm sure they can turn out to have decent programming skills.
Starting from the beginning most students failed to read the preparatory materials. Some of the students did read a little bit, others didn't read at all. At the times when they came prepared I had the feeling the sessions progressed more smoothly. I also had students joining late in the process, which for the most part didn't participate at all in the training. I'd like to avoid that in the future if possible.
Sometimes students complained about lack of example code, although Dive into Python includes tons of examples. I've resorted to sending them the example.py files which I produced during class.
The practical part of the training was mostly myself programming on a big TV screen in front of everyone else. Several times someone from the students took my place. There wasn't much active participation on their part and unfortunately they didn't want to bring personal laptops to the training (or maybe weren't allowed)! We did have a company provided laptop though.
When practicing functions and arithmetic operations the students struggled with basic maths like breaking down a number into its digits or vice versa, working with Fibonacci sequences and the like. In some cases they cheated by converting to/from strings and then iterating over them. Also some hard-coded the first few numbers of the Fibonacci sequence and returned it directly. Maybe an in-place explanation of the underlying maths would have been helpful but honestly I was surprised by this. Somebody please explain or give me an advise here!
I am completely missing examples of the datetime
and timedelta
classes
which tuned out to be very handy in the practical Selenium tasks and we had
to go over them on the fly.
The OOP assignments went mostly undone, not to mention one of them had bonus tasks which are easily solved using recursion. I think we could skip some of the OOP practice (not sure how safe that is) because I really need classes only for constructing the tests and we don't do anything fancy there.
Page Object design pattern is also OOP based and I think that went somewhat well granted that we are only passing values around and performing some actions. I didn't put constraints nor provided guidance on what the classes should look like and which methods go where. Maybe I should have made it easier.
Anyway, given that Page Objects is being replaced by Screenplay pattern, I think we can safely stick to the all-in-one functional based Selenium tests. Maybe utilize helper functions for repeated tasks (like login). Indeed this is what I was using last year with Rspec & Capybara!
Right until the end I had people who had troubles understanding function signatures, function instances and calling/executing a function. Also returning a value from a function vs. printing the (same) value on screen or assigning to the same global variable (e.g. FIB_NUMBERS).
In the same category falls using method parameters vs. using global variables
(which happened to have the same value), using the parameters as arguments to
another function inside the body of the current function, using class attributes
(e.g. self.name
) to store and pass values around vs. local variables in methods
vs. method parameters which have the same names.
I think there was some confusion about lists, dictionaries and tuples but we did practice mostly with list structures so I don't have enough information.
I have the impression that object oriented programming (classes and instances, we didn't go into inheritance) are generally confusing to beginners with zero programming experience. The classical way to explain them is by using some abstractions like animal -> dog -> a particular dog breed -> a particular pet. OOP was explained to me in a similar way back in school so these kinds of abstractions are very natural for me. I have no idea if my explanation sucks or students are having hard time wrapping their heads around the abstraction. I'd love to hear some feedback from other instructors on this one.
I think there is some misunderstanding between a class (a definition of behavior)
and an instance/object of this class (something which exists into memory). This
may also explain the difficulty remembering or figuring out what self
points to
and why do we need to use it inside method bodies.
For unittest.TestCase
we didn't do lots of practice which is my fault.
The homework assignments request the students to go back to solutions
of previous modules and implement more tests for them. Next time I should
provide a module (possibly with non-obvious bugs) and request to write
a comprehensive test suite for it.
Because of the missing practice there was some confusion/misunderstanding
about the setUpClass/tearDownClass
and the setUp/tearDown
methods.
Also add to the mix that the first are @classmethod
while the later
are not. "To be safe" students always defined both as class methods!
I have since corrected the training materials but we didn't have
good examples (nor practiced) explaining the difference between
setUpClass
(executed once aka before suite) and setUp
(possibly executed multiple times aka before test method).
On the Selenium side I think it is mostly practice which students lack, not understanding. The entire Selenium framework (any web test framework for that matter) boils down to
IMO finding the correct element on the page is on-par with waiting (which also relies on locating elements) and took 80% of the time we spent working with Selenium.
Thanks for reading and don't forget to comment and give me your feedback!
Image source: https://www.udemy.com/selenium-webdriver-with-python/
There are comments.
Quality Assurance According 2 Einstein is a talk which introduces several different ideas about how we need to think and approach software testing. It touches on subjects like mutation testing, pairwise testing, automatic test execution, smart test non-execution, using tests as monitoring tools and team/process organization.
Because testing is more thinking than writing I have chosen a different format for this presentation. It contains only slides with famous quotes from one of the greatest thinkers of our time - Albert Einstein!
This blog post includes the accompanying links and references only! It is my first iteration on the topic so expect it to be unclear and incomplete, use your imagination! I will continue working and presenting on the same topic in the next few months so you can expect updates from time to time. In the mean time I am happy to discuss with you down in the comments.
IMAGINATION IS MORE IMPORTANT THAN KNOWLEDGE.
THE FASTER YOU GO, THE SHORTER YOU ARE.
IF THE FACTS DON'T FIT THE THEORY, CHANGE THE FACTS.
THE WHOLE OF SCIENCE IS NOTHING MORE THAN A REFINEMENT OF EVERYDAY THINKING.
INSANITY - DOING THE SAME THING OVER AND OVER AND EXPECTING DIFFERENT RESULTS.
This principle can be applied to any team/process within the organization. The above link is reference to a nice book which was recommended to me but the gist of it is that we always need to analyze, ask questions and change is we want to achieve great results. A practicle example of what is possible if you follow this principle is this talk Accelerate Automation Tests From 3 Hours to 3 Minutes.
THE ONLY REASON FOR TIME IS SO THAT EVERYTHING DOESN'T HAPPEN AT ONCE.
The topic here is "using tests as a monitoring tool". This is something I started a while back ago, helping a prominent startup with their production testing but my involvement ended very soon after the framework was deployed live so I don't have lots of insight.
As the first few days this technique identified some unexpected behaviors, for example a 3rd party service was updating very often. Once even they were broken for a few hours - something nobody had information about.
Since then I've heard about 2 more companies using similar techniques to continuously validate that production software continues to work without having a physical person to verify it. In the event of failures there are alerts which are delath with accordingly.
NO PROBLEM CAN BE SOLVED FROM THE SAME LEVEL OF CONSIOUSNESS THAT CREATED IT.
That much must be obvious to us quality engineers. What about the future however?
I don't have anything more concrete here. Just looking towards what is coming next!
DO NOT WORRY ABOUT YOUR DIFFICULTIES IN MATHEMATICS. I CAN ASSURE YOU MINE ARE STILL GREATER.
Thanks for reading and happy testing!
There are comments.
If you follow my blog you are aware that I use automated tools to do some boring tasks instead of me. For example they can detect when new versions of dependencies I'm using are available and then schedule testing against them on the fly.
One of these tools is Strazar which I use heavily for my Django based packages. Example: django-s3-cache build job.
Recently I've made a slightly different proof-of-concept for a Rust project.
Because rustc and various dependencies (called crates) are updated very often
we didn't want to expand the test matrix like Strazar does. Instead we wanted to
always build & test against the latest crates versions and if that passes
create a pull request for the update (in Cargo.lock
). All of this unattended
of course!
To start create a cron job in Travis CI which will execute once per day and call your test script. The script looks like this:
#!/bin/bash
if [ -z "$GITHUB_TOKEN" ]; then
echo "GITHUB_TOKEN is not defined"
exit 1
fi
BRANCH_NAME="automated_cargo_update"
git checkout -b $BRANCH_NAME
cargo update && cargo test
DIFF=`git diff`
# NOTE: we don't really check the result from testing here. Only that
# something has been changed, e.g. Cargo.lock
if [ -n "$DIFF" ]; then
# configure git authorship
git config --global user.email "atodorov@MrSenko.com"
git config --global user.name "Alexander Todorov"
# add a remote with read/write permissions!
# use token authentication instead of password
git remote add authenticated https://atodorov:$GITHUB_TOKEN@github.com/atodorov/bdcs-api-rs.git
# commit the changes to Cargo.lock
git commit -a -m "Auto-update cargo crates"
# push the changes so that PR API has something to compare against
git push authenticated $BRANCH_NAME
# finally create the PR
curl -X POST -H "Content-Type: application/json" -H "Authorization: token $GITHUB_TOKEN" \
--data '{"title":"Auto-update cargo crates","head":"automated_cargo_update","base":"master", "body":"@atodorov review"}' \
https://api.github.com/repos/atodorov/bdcs-api-rs/pulls
fi
A few notes here:
GITHUB_TOKEN
variable for authentication;Makefile
because the GITHUB_TOKEN
variable will be
expanded into the logs and your secrets go away! Always call the script from your
Makefile
to avoid revealing secrets.Here is the PR which was created by this script: https://github.com/atodorov/bdcs-api-rs/pull/5
Notice that it includes previous commits b/c they have not been merged to the master branch!
Here's the test job (#77) which generated this PR: https://travis-ci.org/atodorov/bdcs-api-rs/builds/219274916
Here's a test job (#87) which bails out miserably because the PR already exists: https://travis-ci.org/atodorov/bdcs-api-rs/builds/220954269
This post is part of my Quality Assurance According to Einstein series - a detailed description of useful techniques I will be presenting very soon.
Thanks for reading and happy testing!
There are comments.
Pairwise (a.k.a. all-pairs) testing is an effective test case generation technique that is based on the observation that most faults are caused by interactions of at most two factors! Pairwise-generated test suites cover all combinations of two therefore are much smaller than exhaustive ones yet still very effective in finding defects. This technique has been pioneered by Microsoft in testing their products. For an example please see their GitHub repo!
I heard about pairwise testing by Niels Sander Christensen last year at QA Challenge Accepted 2.0 and I immediately knew where it would fit into my test matrix.
This article describes an experiment made during Red Hat Enterprise Linux 6.9 installation testing campaign. The experiment covers generating a test plan (referred to Pairwise Test Plan) based on the pairwise test strategy and some heuristics. The goal was to reduce the number of test cases which needed to be executed and still maintain good test coverage (in terms of breadth of testing) and also maintain low risk for the product.
For RHEL 6.9 there are 9 different product variants each comprising of particular package set and CPU architecture:
Traditional testing activities are classified as Tier #1, Tier #2 and Tier #3
This experiment focuses only on Tier #2 and #3 test cases because they generate the largest test matrix! This experiment is related only to installation testing of RHEL. This broadly means "Can the customer install RHEL via the Anaconda installer and boot into the installed system". I do not test functionality of the system after reboot!
I have theorized that from the point of view of installation testing RHEL is mostly a platform independent product!
Individual product variants rarely exhibit differences in their functional behavior because they are compiled from the same code base! If a feature is present it should work the same on all variants. The main differences between variants are:
These differences may lead to problems with resolving dependencies and missing packages but historically haven't shown significant tendency to cause functional failures e.g. using NFS as installation source working on Server but not on Client.
The main component being tested, Anaconda - the installer, is also mostly platform independent. In a previous experiment I had collected code coverage data from Anaconda while performing installation with the same kickstart (or same manual options) on various architectures. The coverage report supports the claim that Anaconda is platform independent! See Anaconda & coverage.py - Pt.3 - coverage-diff, section Kickstart vs. Kickstart!
The traditional pairwise approach focuses on features whose functionality is controlled via parameters. For example: RAID level, encryption cipher, etc. I have taken this definition one level up and applied it to the entire product! Now functionality is also controlled by variant and CPU architecture! This allows me to reduce the number of total test cases in the test matrix but still execute all of them at least once!
The initial implementation used a simple script, built with the Ruby pairwise gem, that:
Copies verbatim all test cases which are applicable for a single product variant, for example s390x Server or ppc64 Server! There's nothing we can do to reduce these from combinatorial point of view!
Then we have the group of test cases with input parameters. For example:
storage / iBFT / No authentication / Network init script
storage / iBFT / CHAP authentication / Network Manager
storage / iBFT / Reverse CHAP authentication / Network Manager
In this example the test is storage / iBFT
and the parameters are
For test cases in this group I also consider the CPU architecture and OS variant as part of the input parameters and combine them using pairwise. Usually this results in around 50% reduction of test efforts compared to testing against all product variants!
Last we have the group of test cases which don't depend on any input parameters,
for example partitioning / swap on LVM
. They are grouped together (wrt their applicable variants)
and each test case is executed only once against a randomly chosen product variant!
This is my own heuristic based on the fact that the product is platform
independent!
NOTE: You may think that for these test cases the product variant is their input parameter. If we consider this to be the case then we'll not get any reduction because of how pairwise generation works (the 2 parameters with the largest number of possible values determine the maximum size of the test matrix). In this case the 9 product variants is the largest set of values!
For this experiment pairwise_spec.rb only produced the list of test scenarios (test cases) to be executed! It doesn't schedule test execution and it doesn't update the test case management system with actual results. It just tells you what to do! Obviously this script will need to integrate with other systems and processes as defined by the organization!
Example results:
RHEL 6.9 Tier #2 and #3 testing
Test case w/o parameters can't be reduced via pairwise
x86_64 Server - partitioning / swap on LVM
x86_64 Workstation - partitioning / swap on LVM
x86_64 Client - partitioning / swap on LVM
x86_64 ComputeNode - partitioning / swap on LVM
i386 Server - partitioning / swap on LVM
i386 Workstation - partitioning / swap on LVM
i386 Client - partitioning / swap on LVM
ppc64 Server - partitioning / swap on LVM
s390x Server - partitioning / swap on LVM
Test case(s) with parameters can be reduced by pairwise
x86_64 Server - rescue mode / LVM / plain
x86_64 ComputeNode - rescue mode / RAID / encrypted
x86_64 Client - rescue mode / RAID / plain
x86_64 Workstation - rescue mode / LVM / encrypted
x86_64 Server - rescue mode / RAID / encrypted
x86_64 Workstation - rescue mode / RAID / plain
x86_64 Client - rescue mode / LVM / encrypted
x86_64 ComputeNode - rescue mode / LVM / plain
i386 Server - rescue mode / LVM / plain
i386 Client - rescue mode / RAID / encrypted
i386 Workstation - rescue mode / RAID / plain
i386 Workstation - rescue mode / LVM / encrypted
i386 Server - rescue mode / RAID / encrypted
i386 Workstation - rescue mode / RAID / encrypted
i386 Client - rescue mode / LVM / plain
ppc64 Server - rescue mode / LVM / plain
s390x Server - rescue mode / RAID / encrypted
s390x Server - rescue mode / RAID / plain
s390x Server - rescue mode / LVM / encrypted
ppc64 Server - rescue mode / RAID / encrypted
Finished in 0.00602 seconds (files took 0.10734 seconds to load)
29 examples, 0 failures
In this example there are 9 (variants) * 2 (partitioning type) * 2 (encryption type) == 32 total combinations! As you can see pairwise reduced them to 20! Also notice that if you don't take CPU arch and variant into account you are left with 2 (partitioning type) * 2 (encryption type) == 4 combinations for each product variant and they can't be reduced on their own!
I did evaluate all bugs which were found by executing the test cases from the pairwise test plan and compared them to the list of all bugs found by the team. This will tell me how good my pairwise test plan was compared to the regular one. "good" meaning:
Results:
Pairwise test plan missed 3 critical regressions due to:
All of the missed regressions could have been missed by regular test plan as well, however the risk of missing them in pairwise is higher b/c of the reduced test matrix and the fact that you may not execute exactly the same test scenario for quite a long time. OTOH the risk can be mitigated with more automation b/c we now have more free resources.
IMO pairwise test plan did a good job and didn't introduce "dramatic" changes in risk level for the product!
Patterns observed:
swap / recommended
which calculates the recommended size of swap partition based on 4 different
ranges in which the actual RAM size fits! These ranges became parameters
to the test case;I have also discovered ideas for new test execution optimization techniques which need to be evaluated and measured further:
These techniques can be used stand-alone or in combination with other optimization techniques and tooling available to the team. They are specific to my particular kind of testing so beware of your surroundings before you try them out!
Thanks for reading and happy testing!
Cover image copyright: cio-today.com
There are comments.
In a previous post I have shown an example of real world bugs which we were not able to detect despite having 100% mutation and test coverage. I am going to show you another example here.
This example comes from one of my training courses. The task is to write a class which represents a bank account with methods to deposit, withdraw and transfer money. The solution looks like this
class BankAccount(object):
def __init__(self, name, balance):
self.name = name
self._balance = balance
self._history = []
def deposit(self, amount):
if amount <= 0:
raise ValueError('Deposit amount must be positive!')
self._balance += amount
def withdraw(self, amount):
if amount <= 0:
raise ValueError('Withdraw amount must be positive!')
if amount <= self._balance:
self._balance -= amount
return True
else:
self._history.append("Withdraw for %d failed" % amount)
return False
def transfer_to(self, other_account, how_much):
self.withdraw(how_much)
other_account.deposit(how_much)
Notice that if withdrawal is not possible then the function returns False
. The tests
look like this
import unittest
from solution import BankAccount
class TestBankAccount(unittest.TestCase):
def setUp(self):
self.account = BankAccount("Rado", 0)
def test_deposit_positive_amount(self):
self.account.deposit(1)
self.assertEqual(self.account._balance, 1)
def test_deposit_negative_amount(self):
with self.assertRaises(ValueError):
self.account.deposit(-100)
def test_deposit_zero_amount(self):
with self.assertRaises(ValueError):
self.account.deposit(0)
def test_withdraw_positive_amount(self):
self.account.deposit(100)
result = self.account.withdraw(1)
self.assertTrue(result)
self.assertEqual(self.account._balance, 99)
def test_withdraw_maximum_amount(self):
self.account.deposit(100)
result = self.account.withdraw(100)
self.assertTrue(result)
self.assertEqual(self.account._balance, 0)
def test_withdraw_from_empty_account(self):
result = self.account.withdraw(50)
self.assertIsNotNone(result)
self.assertFalse(result)
assert "Withdraw for 50 failed" in self.account._history
def test_withdraw_non_positive_amount(self):
with self.assertRaises(ValueError):
self.account.withdraw(0)
with self.assertRaises(ValueError):
self.account.withdraw(-1)
def test_transfer_negative_amount(self):
account_1 = BankAccount('For testing', 100)
account_2 = BankAccount('In dollars', 10)
with self.assertRaises(ValueError):
account_1.transfer_to(account_2, -50)
self.assertEqual(account_1._balance, 100)
self.assertEqual(account_2._balance, 10)
def test_transfer_positive_amount(self):
account_1 = BankAccount('For testing', 100)
account_2 = BankAccount('In dollars', 10)
account_1.transfer_to(account_2, 50)
self.assertEqual(account_1._balance, 50)
self.assertEqual(account_2._balance, 60)
if __name__ == '__main__':
unittest.main()
Try the following commands to verify that you have 100% coverage
coverage run test.py
coverage report
cosmic-ray run --test-runner nose --baseline 10 example.json bank.py -- test.py`
cosmic-ray report example.json
Can you tell where the bug is ? How about I try to transfer more money than is available from one account to the other
def test_transfer_more_than_available_balance(self):
account_1 = BankAccount('For testing', 100)
account_2 = BankAccount('In dollars', 10)
# transfer more than available
account_1.transfer_to(account_2, 150)
self.assertEqual(account_1._balance, 100)
self.assertEqual(account_2._balance, 10)
If you execute the above test it will fail
FAIL: test_transfer_more_than_available_balance (__main__.TestBankAccount)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./test.py", line 79, in test_transfer_more_than_available_balance
self.assertEqual(account_2._balance, 10)
AssertionError: 160 != 10
----------------------------------------------------------------------
The problem is that when self.withdraw(how_much)
fails transfer_to()
ignores
the result and tries to deposit the money into the other account! A better
implementation would be
def transfer_to(self, other_account, how_much):
if self.withdraw(how_much):
other_account.deposit(how_much)
else:
raise Exception('Transfer failed!')
In my earlier article the bugs were caused by external environment and tools/metrics like code coverage and mutation score are not affected by it. In fact the jinja-ab example falls into the category of data coverage testing.
The current example on the other hand is ignoring the return value of the withdraw()
function and that's why it fails when we add the appropriate test.
NOTE: some mutation test tools support the removing/modifying return value
mutation. Cosmic Ray doesn't support this at the moment (I should add it). Even if it did
that would not help us find the bug because we would kill the mutation using
the test_withdraw...()
test methods, which already assert on the return value!
Thanks for reading and happy testing!
There are comments.
Nitrate is an open source test plan, test run and test case management system I have been working on for a while now. I have been maintaining a custom fork over at Mr. Senko which includes various bug fixes and enhancements which are not yet upstream.
Recently the Methods & Tools QA portal published an article about Nitrate. You can find it here!
Happy reading!
There are comments.
My previous blog post was about the Hello Ruby book, Coder Dojo and making computers out of paper - all cool things for a 5 year old girl. This week I have discovered the Build the Robot book (link to BG edition)!
The book includes colorful pictures and some interesting facts about robots. On the second page it talks about degrees of freedom, which I've studied at technical university during my Mechanics course. How's that for a children's book ?
The most important part of the book are cardboard models of 3 robots: walking one (orange), dancing one (light blue) and one waving its hands (black). The pieces are put together by friction and all of the 3 robots use spring loaded motors for some basic movements.
We did have to use some glue because one of the legs kept falling apart but overall the print/cut quality of the Bulgarian edition was very good.
From the 3 robots the walking one is the worst. I think it is too heavy for the motor to move around. The dancing robot works most of the time. The robot which waves his hands up and down works best!
Overall a very good book, fun to build and play with and very informative! I strongly recommend it if you have small children and want them to feel comfortable around technology!
Thanks for reading!
There are comments.
In early December'16 together with my 5 year old daughter we visited an introductory workshop about the Hello Ruby book and another workshop organized by Coder Dojo Bulgaria. Later that month we also visited a Robo League competition in Sofia. The goal was to further Adriana's interest into technical topics and programming in particular and see how she will respond to the topics covered and the workshops and training materials format in general. I have been keeping detailed notes and today I'm publishing some of my observations.
The events that we visited were strictly for small children and there were mentors who worked with the kids. Each mentor, depending on the event works with up to 4 or 5 children. Parents were not allowed to interfere and I have been keeping my distance on purpose, trying to observe and document as much as possible.
Hello Ruby is a small book with colorful illustrations about a girl who embarks on adventures in programming. Adriana considers it a fairy tale although the book introduces lots of IT related terms - Ruby and gems, Firefox, Snow Leopard, Django, etc. For a child these don't necessarily mean anything but she was able to recognize my Red Hat fedora which was depicted on one of the pages.
The workshop itself was the introduction of the Bulgarian translation, which I've purchased, and had the kids build a laptop using glue and paper icons. Mentors were explaining to the children what the various icons mean, that viruses are a bad thing for your computer, what a CPU and computer memory are and everything else in between. A month later when Adriana started building a new paper computer on her own (without being provoked by me) she told me that the colored icons were "information" that goes into the computer!
After the story part of the book there are exercises designed to create analytical thinking. We did only a few in the beginning where she had to create a list of action sequence how to make the bed or get dressed up in the morning, etc. At the time Adriana didn't receive the game very well and was having some troubles figuring out the smaller actions that comprise a larger activity. We didn't follow through with the game.
At the second event she was exposed to studio.code.org! At the time we were required to bring a working laptop and a mouse. I had no idea how these were going to be used. It turned out mentors gave each child a training course from code.org according to their age. Adriana started with the Course #1 because she can't read on her own!
At first it seemed to me that Adi was a bit bored and didn't know what to do, staring cluelessly at the screen. Btw this was her first session working with a computer on her own. After a while the mentor came and I guess explained what needs to be done, how the controls work and what the objective of the exercise was. After that I noticed she's working more independently and grew interested in the subject. She had a problem working with the mouse and after 2 days I've nudged her to use the TrackPoint and mouse buttons on a ThinkPad laptop. She uses them with both hands, so am I btw, and is much more adept at controlling the cursor that way. If you are going to teach children how to work effectively with a computer you may as well start with teaching them to work effectively with a track pad!
The courses are comprised of games and puzzles (which she's very good at) asking children to perform a very basic programming concept. For example instruct an angry bird to move left or right by using blocks for each instruction. By the time the workshop was over Adriana had completed 4 levels on her own.
Level 5 introduced a button for step-by-step execution of the program also colloquially known as debugging :). The first few exercises she had no idea what to do with this debugging button. Then the 6th exercise introduced a wrong starting sequence and everything snapped into place.
Level 7 introduced additional instructions. There are move left/right instructions as well as visit a flower and make honey instruction. This level also introduces repeating instructions, for example make honey 2 times. At first that was confusing but then she started to take a notice at the numbers shown on screen and started to figure out how to build the proper sequence of blocks to complete the game. When she made mistakes she used the debugging button to figure out which block was not in place and remove it.
After this level Adi started making more mistakes, but more importantly she also started trying to figure them out on her own. My help was limited to asking questions like "what do you need to do", "where are you at the screen now", "what instructions do you need to execute to get where you want to be".
Level 8 introduces a new type of game, drawing shapes on the screen. The hardest part here is that you need to jump from one node to another sometimes. This is great for improving the child spatial orientation skills.
Level 11 is a reading game in English. You need to instruct a bee to fly across different letters to complete a word shown on the screen. However Adriana can't read much less in English, although she understands and speaks English well for her age. In this case I believe she relied on pattern recognition to complete all exercises in this level. She would look at the target word and then identify the letters on the playing board. Next she would stack instruction blocks to program the movements of the bee towards her goal as in previous exercises.
Level 13 introduces loops. It took Adriana 7 exercises to figure out what a loop is, identify the various elements of it and how to construct it properly. She also said that was amusing to her. Almost immediately she was able to identify the length of the loop by herself and construct loops with only 1 block inside their body. Loops with 2 or more blocks inside their body were a bit harder.
Level 14 introduced nested loops, usually one or more instruction blocks paired
with a loop block, nested inside another loop block. For example:
repeat 3 times(move left, repeat 2 times(move down))
. Again it took her about 6
exercises to figure them out. This is roughly at the middle of the level.
Level 16 was quite hard. It had blocks with parameters where you have to type in some words and animal characters will "speak these words" as if in a comic book. I'm not sure if there was supposed to be a text to speech engine integrated for this level but sounds like a good idea. Anyhow this level was not on-par with her skills.
The course completed with free range drawing using instruction blocks and cycles. The image she drew was actually her name where she had to guess how much scribbles the painter needs to do in one direction, then traverse back and go into another direction. She also had to figure out how big each letter needs to be so that it is possible to actually draw it given the game limitations in motion and directions. This final level required lot of my help.
I have never had any doubts that small children are very clever and capable of understanding enormous amounts of information and new concepts. However I'm amazed by how deep their understanding goes and how fast they are able to apply the new things they learn.
Through games and practical workshops I believe it is very easy to teach children to gain valuable skills for engineering professions. Even if they don't end up in engineering the ability to clearly define goals and instructions, break down complex tasks into small chunks and clearly communicate intentions is a great advantage. So is the ability to analyze the task on your own and use simple building blocks to achieve larger objectives.
I will continue to keep notes on Adi's progress but will very likely write about it less frequently. If you do have small children around you please introduce them to Hello Ruby and studio.code.org and help them learn!
Thanks for reading!
There are comments.
Due to popular demand I'm sharing my plans for the upcoming conference season. Here is a list of events I plan to visit and speak at (hopefully). The list will be updated throughout the year so please subscribe to the comments section to receive a notification when that happens! I'm open to meeting new people so ping me for a beer if you are attending some of these events!
Note: added on Jan 12th
Last year I had an amazing time visiting an Elixir & Erlang workshop so I'm about to repeat the experience. I will be visiting a workshop organized by HackBelgium and keep you posted with the results.
Note: added on Jan 12th
Git Merge is organized by GitHub and will be held in Brussels this year. I will be visiting only the conference track and hopefully giving a lightning talk titled Automatic upstream dependency testing with GitHub API! That and the afterparty of course!
FOSDEM is the largest free and open source gathering in Europe which I have been visiting since 2009 (IIRC). You can checkout some of my reports about FOSDEM 2014, Day 1, FOSDEM 2014, Day 2 and FOSDEM 2016.
I will present my Mutants, tests and zombies talk at the Testing & Automation devroom on Sunday.
UPDATE: Video recording is available here
I will be in Brussels between February 1st and 5th to explore the local start-up scene and get to meet with the Python community so ping me if you are around.
QA: Challenge Accepted is a specialized QA conference in Sofia and most of the sessions are in Bulgarian. I've visited last year and it was great. I even proposed a challenge of my own.
CFP is still open but I have a strong confidence that my talk Testing Red Hat Enterprise Linux the MicroSoft way will be approved. It will describe a large scale experiment with pairwise testing, which btw I learned about at QA: Challenge Accepted 2.0 :).
UPDATE: I will be also on the jury for QA of the year award.
UPDATE: I did a lightning talk about my test case management system (in Bulgarian).
I have been a moderator at Bulgaria Web Summit for several years and this year is no exception. This is one of the strongest events held in Sofia and is in English. Last year over 60% of the attendees were from abroad so you are welcome!
I'm not going to speak at this event but will record as much of it as possible. UPDATE: Checkout the recordings on my YouTube channel!
RTC'17 is a new event I found in neighboring Romania. The topic this year is Thriving and remaining relevant in Quality Assurance. My talk is titled Quality Assistance in the Brave New World where I'll share some experiences and visions for the QA profession if that gets accepted.
As it turned out I know a few people living in Cluj so I'll be arriving one day earlier on May 9th to meet the locals.
Open Source Conference Albania is the largest OSS event in the country. I'm on a row here to explore the IT scene on the Balkans. Due to traveling constraints my availability will be limited to the conference venue only but I've booked a hotel across the street :).
I will be meeting a few friends in Tirana and hear about the progress of an psychological experiment we devised with Jona Azizaj and Suela Palushi.
Talking wise I'm hoping to get the chance of introducing mutation testing and even host a workshop on the topic.
UPDATE: Here is the video recording from OSCAL. The quality is very poor though.
This is the 3rd edition of DEVit, the 360° web development conference of Northern Greece. I've been a regular visitor since the beginning and this year I've proposed a session on mutation testing. Because the Thessaloniki community seems more interested in Ruby and Rails my goal is to share more examples from my Ruby work and compare how that is different from the Python world. There is once again an opportunity for a workshop.
So far I've been the only Bulgarian to visit DEVit and also locally known as "The guy who Kosta & Kosta met in Sofia"! Checkout my impressions from DEVit'15 and DEVit'16 if you are still wondering whether to attend or not! I strongly recommend it!
UPDATE: I am still the only Bulgarian visiting DEVit!
I've hosted a session titled Quality Assurance According to Einstein for the local QA community in Sofia. Video (in Bulgarian) and links are available!
UPDATE: added post-mortem.
Shift appears to be a very big event in Croatia. My attendance is still unconfirmed due to lots of traveling before that and the general trouble of efficiently traveling on the Balkans. However I have a CFP submitted and waiting for approval. This time it is my Mutation Testing in Patterns, which is a journal of different code patterns found during mutation testing. I have not yet presented it to the public but will blog about it sometime soon so stay tuned.
UPDATE: this one is a no-go!
TuxCon is held in Plovdiv around the beginning of July. I'm usually presenting some lightning talks there and use the opportunity to meet with friends and peers outside Sofia and catch up with news from the local community. The conference is in Bulgarian with the exception of the occasional foreign speaker. If you understand Bulgarian I recommend the story of Puldin - a Bulgarian computer from the 80s.
UPDATE: I will be opening the conference with QA According to Einstein
How Camp is the little brother of Bulgaria Web Summit and is always held outside of Sofia. This year it will be in Varna, Bulgaria. I will be there of course and depending on the crowd may talk about some software testing patterns.
UPDATE: it looks like this is also a no-go but stay tuned for the upcoming Macedonia Web Summit and Albania Web Summit where yours trully will probably be a moderator!
Yeah, this is the conference organized by SuSE. I'm definitely not afraid to visit the competition. I even hope I could teach them something. More details are still TBA because this event is very close to/overlapping with the next two.
South East European Software Testing Conference is, AFAIK, an international event which is hosted in a major city on the Balkans. Last year it was held in Bucharest with previous years held in Sofia.
In my view this is the most formal event, especially related to software testing, I'm about to visit. Nevertheless I like to hear about new ideas and some research in the field of QA so this is a good opportunity.
UPDATE: I have submitted a new talk titled If the facts don't fit the theory, change the facts!
HackConf is one of the largest conferences in Bulgaria, gathering over 1000 people each year. I am strongly affiliated with the people who organize it and even had the opportunity to host the opening session last year. The picture above is from this event.
The audience is still very young and inexperienced but the presenters are above average. Organizers' goal is HackConf to become the strongest technical conference in the country and also serve as a sort of inspiration for young IT professionals.
UPDATE: I have submitted both a talk proposal and a workshop proposal. For the workshop I intend to teach children Python and Selenium Automation in 8 hours. I've also been helping the organizers with bringing some very cool speakers from abroad!
IT Weekend is organized by Petar Sabev, the same person who's behind QA: Challenge Accepted. It is a non-formal gathering of engineers with the intent to share some news and then discuss and share problems and experiences. Checkout my review of IT Weekend #1 and IT Weekend #3.
The topic revolve around QA, leadership and management but the format is open and the intention is to broaden the topics covered. The event is held outside Sofia at a SPA hotel and makes for a very nice retreat. I don't have a topic but I'm definitely going if time allows. I will probably make something up if we have QA slots available :).
Software Freedom Kosova is one of the oldest conferences about free and open source software in the region. This is part of my goal to explore the IT communities on the Balkans. Kosovo sounds a bit strange to visit but I did recognize a few names on the speaker list of previous years.
The CFP is not open yet but I'm planning to make a presentation. Also if weather allows I'm planning a road trip on my motorbike :).
UPDATE: I've met with some of the FLOSSK members in Tirana at OSCAL and they seem to be more busy with running the hacker space in Prishtina so the conference is nearly a no-go.
GTAC 2017 will be held in London. Both speakers and attendees are pre-approved and my goal is to be the first Red Hatter and second Bulgarian to speak at GTAC.
The previous two years saw talks about mutation testing vs. coverage and their respective use to determine the quality of a test suite with both parties arguing against each other. Since I'm working in both of these fields and have at least two practical example I'm trying to gather more information and present my findings to the world.
UPDATE: I've also submitted my Testing Red Hat Enterprise Linux the Microsoft way
Innovations in Software Technologies and Automation started as a QA conference several years ago but it has broaden the range of acceptable topics to include development, devops and agile. I was at the first two editions and then didn't attend for a while until last year when I really liked it. The event is entirely in English with lots of foreign speakers.
Recently I've been working on something I call "Regression Test Monitoring" and my intention is to present this at ISTA 2017 so stay tuned.
UPDATE: I didn't manage to collect enough information on the Regression Test Monitoring topic but have made two other proposals.
Thanks for reading and see you around!
There are comments.
At the beginning of this year I've hosted the first QA related course at HackBulgaria. This is a long overdue post about how the course went, what worked well and what didn't. Brace yourself because it is going to be a long one.
The idea behind a QA course has been lurking in both RadoRado's (from HackBulgaria) and my heads for quite a while. We've been discussing it at least a year before we actually started. One day Rado told me he'd found a sponsor and we have the go ahead for the course and that's how it all started!
The first issue was that we weren't prepared to start at a moments notice. I literally had two weeks to prepare the curriculum and initial interview questions. Next we opened the application form and left it open until the last possible moment. I've been reviewing candidate answers hours before the course started, which was another mistake we made!
On the positive side is that I hosted a Q&A session on YouTube answering general questions about the profession and the course itself. This live stream helped popularize the course.
At the start we had 30 people and around 13 of them managed to "graduate" till the final lesson. The biggest portion of students dropped out after the first 5 lessons of Java crash course! Each lesson was around 4 hours with 20-30 minutes break in the middle.
With respect to the criteria find a first job or find a new/better job I consider the training successful. To my knowledge all students have found better jobs, many of them as software testers!
On the practical side of things students managed to find and report 11 interesting bugs against Fedora. Mind you that these were all found in the wild: fedora-infrastructure #5323, RHBZ#1339701, RHBZ#1339709, RHBZ#1339713, RHBZ#1339719, RHBZ#1339731, RHBZ#1339739, RHBZ#1339742, RHBZ#1339746, RHBZ#1340541, RHBZ#1340891.
Then students also made a few pull requests on GitHub (3 which I know off): commons-math #38, commons-csv #12, commons-email #1.
For reference most lessons were a mix of short presentation about theory and best practices followed by discussions and where appropriate practical sessions with technology or projects. The exercises were designed for individual work, work in pairs or small groups (4-5) on purpose.
By request from the sponsors I've tried to keep a detailed record of each student's performance and personality traits as much as I was able to observe them. I really enjoyed keeping such a journal but didn't share this info with my students which I consider a negative issue. I think knowing where your strong and weak areas are would help you become a better expert in your field!
I have to point out that while these are valid concerns and major issues students were at least partially guilty for the last 3 of them. It was my impression that most of them didn't prepare at home, didn't read the next lesson and didn't install prerequisite tools and software!
We've started with a Java crash course as requested by our sponsors which was extended to 5 instead of the original 3 sessions. RadoRado was teaching Java fundamentals while I was assisting him with comments.
On the good side is that Rado explains very well and in much details. He also writes code to demonstrate what he teaches and while doing so uses only the knowledge he's presented so far. For example if there's a repeating logic/functionality he would just write it twice instead of refactoring that into a separate function with parameters (assuming the students have not learned about functions yet). I think this made it more easier to understand the concepts being taught.
Another positive thing we did was me going behind Rado's computer and modifying some of the code while he was explaining something on screen. If you take the above example and have two methods with print out salutations, e.g. "Good morning, Alex" I would go and modify one of them to include "Mr." while the other will not. This introduced a change in behavior which ultimately results in a bug! This was a nice practical way to demonstrate how some classes of bugs get introduced in reality. We did only a few of these behind the computer changes and I definitely liked them! They were all ad-hoc, not planned for.
On the negative side Java seems hard to learn and after these 5 lessons half of the students dropped out. Maybe part of the reason is they didn't expect to start a QA course with lessons about programming. But that also means they didn't pay enough attention to the curriculum, which was announced in advance!
I had made a point to assign time constraints to each exercise in the lessons. While that mostly worked in the first few lessons, where there is more theory, we didn't keep the schedule and were overtime.
Explaining testing theory (based on ISTQB fundamentals) took longer than I expected. it was also evident that we needed more written examples of what different test analysis techniques are (e.g. boundary value analysis). Here Petar Sabev helped me deliver few very nice examples.
One of the exercises was "when to stop testing" with an example of a Sudoku solving function and different environments in which this code operates, e.g. browser, mobile, etc. Students appeared to have a hard time understanding what a "runtime environment" is and define relevant tests based on that! I believe most of the students, due to lack of knowledge and experience, were also having a hard time grasping the concept of non-functional testing.
A positive thing was that students started explaining to one another and giving examples for bugs they've seen outside the course.
This lesson was designed as role playing game to demonstrate the most common software development methodologies - waterfall and agile and discuss the QA role in both of them. The format by itself is very hard to conduct successfully and this was my first time ever doing this. I've also never taken part of such games until then, only heard about them.
During the waterfall exercise it was harder for the students to follow the game constraints and not exchange information with one another because they were sitting on the same table.
On the positive side all groups came with unique ideas about software features and how they want to develop them. Timewise we managed to do very well. On the negative side is that I was the client for all groups and didn't manage to pay enough attention to everyone, which btw is what clients usually do in real life.
This lesson was a practical exercise in writing bug reports and figuring out what information needs to be present in a good bug report. Btw this is something I always ask junior members at job interviews.
First we started with working in pairs to define what a good bug report is without actually knowing what that means. Students found it hard to brainstorm together and most of them worked alone during this exercise.
Next students had to write bug reports for some example bugs, which I've explained briefly on purpose and perform peer reviews of their bugs. Reviews took a long time to complete but overall students had a good idea of what information to include in a bug report.
Then, after learning from their mistakes and hearing what others had done, they've learned about some good practices and were tasked to rewrite their bug reports using the new knowledge. I really like the approach of letting students make some mistakes and then showing them the easier/better way of doing things. This is also on-par with Ivan Nemytchenko's methodology of letting his interns learn by their mistakes.
All bug reports can be found in students repositories, which are forked from the curriculum. Check out https://github.com/HackBulgaria/QA-and-Automation-101/network.
I should have really asked everyone to file bugs under the curriculum repository so it is easier for me to track them. On the other hand I wanted each student to start building their own public profile to show potential employers.
This lesson started with an exercise asking students to create accounts for Red Hat's OpenShift cloud platform in the form of a test scenario. The scenario intentionally left out some details. The idea being that missing information introduces inconsistencies during testing and to demonstrate how the same steps were performed slightly differently.
We had some troubles explaining exactly "how did you test" because most inexperienced people would not remember all details about where they clicked, did they use the mouse or the keyboard, was the tab order correct, etc. Regardless students managed to receive different results and discover that some email providers were not supported.
The homework assignment was to create test plans and test cases in Nitrate at https://nitrate-hackbg.rhcloud.com/. Unfortunately the system appears to be down ATM and I don't have time to investigate why. This piece of infrastructure was put together in 2 hours and I'm surprised it lasted without issues during the entire course.
This was a crash course in Linux fundamentals and exercise with most common commands and text editors in the console. Most of the students were not prepared with virtual machines. We've also used a cloud provider to give students remote shell but the provider API was failing and we had to deploy docker containers manually. Overall infrastructure was a big problem but we somehow managed. Another problem was with ssh clients from Windows who generated keys in a format that our cloud provider couldn't understand.
Wrt commands and exercises students did well and managed to execute most of them on their own. That's very good for people who've never seen a terminal in their lives (more or less).
Once again nobody was prepared with a virtual machine with Fedora and students were installing software as we go. Because of that we didn't manage to conduct the lesson properly and had to repeat it on the next session.
Rawhide being the bleeding edge of Fedora means it is full of bugs. Well I couldn't keep up with everyone and explain workarounds or how to install/upgrade Fedora. That was a major setback. It also became evident that you can't move quickly if you have no idea what to do and no instructions about it either.
Once prepared with the latest and greatest from Rawhide the task was to analyze the proposed feature changes (on the Fedora wiki) and create test plans and design test cases for said changes. Then execute the tests in search for bugs. This is where some of the bugs above came from. The rest were found during upgrades.
This lesson was team work (4-5 students) but the results were mixed. IMO Fedora changes are quite hard to grasp, especially if you lack domain knowledge and broader knowledge about the structure and operation of a Linux distribution. I don't think most teams were able to clearly understand their chosen features and successfully create good plans/scenarios for them. On the other hand in real life software you don't necessarily understand the domain better and know what to do. I've been in situations where whole features have been defined by a single sentence and requested to be tested by QA.
One of the teams didn't manage to install Fedora (IIRC they didn't have laptops capable of running a VM) and were not able to conduct the exercise.
Being able to find real life bugs, some of them serious, and getting traction in Bugzilla is the most positive effect of this lesson. I personally wanted to have more output (e.g. more bugs, more cases defined, etc) but taking into account the blocking factors and setbacks I think this is a good initial result.
Here we had a few examples of bad stubs and mocks which were not received very well. The topic is hard in itself and wasn't very well explained with practical examples.
Another negative thing is that students took a lot of time to fiddle around with Eclipse, they were mistyping commands in the terminal and generally not paying enough attention to instructions. This caused the exercises to go slowly.
We've had an exercise which asks the student to write a new test for a non-existing method. Then implement the method and make sure all the tests passed. You guessed it this is Test Driven Development. IIRC one of the students was having a hard time with that exercise so I popped up my editor on the large screen and started typing what she told me, then re-running the tests and asking her to show me the errors I've made and tell me how to correct them. The exercise was received very well and was fun to do.
Due to lack of time we had to go over TravisCI very quickly. The other bad thing about TravisCI is that it requires git/GitHub and the students were generally inexperienced with that. Both GitHub for Windows and Mac OS suck a big time IMO. What you need is the console. However none of the students had any practical experience with git and knew how to commit code and push branches to GitHub. git fundamentals however is a separate one or two lessons by itself which we didn't do.
Excluding the problems with Eclipse and the GitHub desktop client and missing instructions for Windows the hardest part of this lesson was actually selecting a component to work on, understanding what the code does and actually writing meaningful tests. On top of that most students were not very proficient programmers and Java was completely new to them.
Despite having 3 pull requests on GitHub I consider this lesson to be a failure.
This lesson starts with an example of what a flaky test it. At the moment I don't think this lesson is the best place for that example. To make things even more difficult the example is in Python (because that way was the easiest for me to write it) instead of Java. Students had problems installing Python on Windows just to make this example work. They also were lacking the knowledge how to execute a script in the terminal.
One of the students proposed a better flaky example utilizing dates and times and executing it during various hours of the day. I have yet to code this and prepare environment in which it would be executed. Btw recently I've seen similar behavior caused by inconsistent timezone usage in Ruby which resulted in unexpected time offset a little after midnight :).
Once again I have to point out that students came generally unprepared for the lesson and haven't installed prerequisite software and programming languages. This is becoming a trend and needs to be split out into a preparation session, possibly with a check list.
On the Selenium side, starting with Selenium IDE, it was a bit unclear how to use it and what needs to be done. This is another negative trend, where students were missing clear instructions what they are expected to do. At the end we did resort to live demo using Selenium IDE so they can at least get some idea about it.
IMO these two lessons are the biggest disaster of the entire course. Python & virtualenv on Windows was a total no go but on Linux things weren't much easier because students had no idea what a virtualenv is.
Practice wise they haven't managed to read all the bugs on the Mozilla bug tracker and had a very hard time selecting bugs to write tests for. Not to mention that many of the reported bugs were administrative tasks to create or remove add-on categories. There weren't many functional related bugs to write tests for.
The product under test was also hard to understand and most students were seeing it for the first time, let alone getting to know the devel and testing environments that Mozilla provides. Mozilla's test suite being in Python is just another issue to make contribution harder because we've never actually studied Python.
Between the two lessons there were students who've missed the Selenium introduction lesson and were having even harder time to figure things out. I didn't have the time to explain and go back to the previous lesson for them. Maybe an attendance policy is needed for dependent lessons.
Before the course started I've talked to some guys at Mozilla's IRC channel and they agreed to help but at the end we didn't even engage with them. At this point I'm skeptical that mentoring over IRC would have worked anyway.
This was a more theoretical lesson with less practical examples and exercises. I have provided some blog posts of mine related to the topic of performance testing but in general they are related to software that I've used which isn't generally known to a less experienced audience (Celery, Twisted). These blog posts IMO were hard to put into perspective and didn't serve a good purpose as examples.
The practical part of the lesson was a discussion with the goal of creating a performance testing strategy for GitHub's infrastructure. It was me who was doing most of the talking because students have no experience working on such a large infrastructure like GitHub and didn't know what components might be there, how they might be organized (load balancers, fail overs, etc) and what needs to be tested.
There was also a more practical example to create a performance test in Java for
one of the classes found in commons-codec/src/main/java/org/apache/commons/codec/digest
.
Again the main difficulty here was working fluently with Eclipse, getting the projects to
build/run and knowing how the software under test was supposed to work and be executed.
This was a more relaxed lesson with examples of simple types of bugs found on a large scale. Most examples came from my blog and experiments I've made against Fedora.
While amusing and fun I don't think all of the students understood me and kept their attention. Part of that is because Fedora tends to focus on low level stuff and my examples were not necessarily easy to understand.
Experian Bulgaria was the exclusive sponsor for this course. At the end of the summer Rado and I met with them to discuss the results of the training. Here's what they say
On the topic of testing knowledge candidates did mostly OK, however we don't have enough information about this. Also the hiring process at Experian is more focused on the broader knowledge areas listed above so substantial improvement in the testing knowledge of candidates doesn't given them much head start.
While to my knowledge they didn't hire anyone few people received an offer but declined due to various personal reasons. I view this as poor performance on our side but Experian thinks otherwise and are willing to sponsor another round of training.
Here is a list of all the things that could be improved
If you have suggestions please comment below, especially if you can tell me how to implement them in practice.
Thanks for reading and happy testing!
There are comments.
Every activity in software development has a cost and a value. Getting cost to
trend down while increasing value, is the ultimate goal.
This is the introduction of an e-book called 4 Quick Wins to Manage the Cost of Software Testing. It was sent to me by Ivan Fingarov couple of months ago. Just now I've managed to read it and here's a quick summary. I urge everyone to download the original copy and give it a read.
The paper focuses on several practices which organizations can apply immediately in order to become more efficient and transparent in their software testing. While larger organizations (e.g. enterprises) have most of these practices already in place smaller companies (up to 50-100 engineering staff) may not be familiar with them and will reap the most benefits of implementing said practices. Even though I work for a large enterprise I find this guide useful when considered at the individual team level!
The first chapter focuses on Tactics to minimize cost: Process, Tools, Bug System Mining and Eliminating Handoffs.
In Process the goal is to minimize the burden of documenting the test process (aka testing artifacts), allow for better transparency and visibility outside the QA group and streamline the decision making process of what to test and when to stop testing, how much has been tested, what the risk is, ect. The authors propose testing core functionality paired with emerging risk areas based on new features development. They propose making a list of these and sorting that list by perceived risk/priority and testing as much as possible. Indeed this is very similar to the method I've used at Red Hat when designing testing for new features and new major releases of Red Hat Enterprise Linux. A similar method I've seen in place at several start-ups as well, although in the small organization the primary driver for this method is lack of sufficient test resources.
Tools proposes the use of test case management systems to ease the documentation burden. I've used TestLink and Nitrate. From them Nitrate has more features but is currently unmaintained with me being the largest contributor on GitHub. From the paid variants I've used Polarion which I generally dislike. Polarion is most suitable for large organizations because it gives lots of opportunities for tracking and reporting. For small organizations it is an overkill.
Bug System Mining is a technique which involves regularly scanning the bug tracker and searching for patterns. This is useful for finding bug types which appear frequently and generally point to a flaw in the software development process. The fix for these flaws usually is a change in policy/workflow which eliminates the source of the errors. I'm a fan of this technique when joining an existing project and need to assess what the current state is. I've done this when consulting for a few start-ups, including Jitsi Meet (acquired by Atlassian), however I'm not doing bug mining on a regular basis which I consider a drawback and I really should start doing!
For example at one project I found lots of bugs reported against translations, e.g. missing translations, text overflowing the visible screen area or not playing well with existing design, chosen language/style not fitting well with the product domain, etc.
The root cause of the problem was how the software in question has been localized. The translators were given a file of English strings, which they would translate and return back in an spread sheet. Developers would copy&paste the translated strings into localization files and integrate with the software. Then QA would usually inspect all the pages and report the above issues. The solution was to remove devel and QA from the translation process, implement a translation management system together with live preview (web based) so that translators can keep track of what is left to translate and can visually inspect their work immediately after a string was translated. Thus translators are given more context for their work but also given the responsibility to produce good quality translations.
Another example I've seen are many bugs which seem like a follow up/nice to have features of partially implemented functionality. The root cause of this problem turned out to be that devel was jumping straight to implementation without taking the time to brainstorm and consult with QE and product owners, not taking into account corner cases and minor issues which would have easily been raised by skillful testers. This process lead to several iterations until the said functionality was considered initially implemented.
Eliminating Handoffs proposes the use of cross-functional teams to reduce idle time and reduce the back-and-forth communication which happens when a bug is found, reported, evaluated and considered for a fix, fixed by devel and finally deployed for testing. This method argues that including testers early in the process and pairing them with the devel team will produce faster bug fixes and reduce communication burden.
While I generally agree with that statement it's worth noting that cross-functional teams perform really well when all team members have relatively equal skill level on the horizontal scale and strong experience on the vertical scale (think T-shaped specialist). Cross-functional teams don't work well when you have developers who aren't well versed in the testing domain and/or testers who are not well versed in programming or the broader OS/computer science fundamentals domain. In my opinion you need well experienced engineers for a good cross-functional team.
In the chapter Collaboration the paper focuses on pairing, building the right thing and faster feedback loops for developers. This overlaps with earlier proposals for cross-functional teams and QA bringing value by asking the "what if" questions. The chapter specifically talks about the Three Amigos meeting between PM, devel and QA where they discuss a feature proposal from all angles and finally come to a conclusion what the feature should look like. I'm a strong supporter of this technique and have been working with it under one form or another during my entire career. This also touches on the notion that testers need to move into the Quality Assistance business and be proactive during the software development process, which is something I'm hoping to talk about at the Romanian Testing Conference next year!
Finally the book talks about Skills Development and makes the distinction between Centers of Excellence (CoE) and Communities of Practice (CoP). Both the book and I are supporters of the CoP approach. This is a bottoms-up approach which is open for everyone to join in and harnesses the team creative abilities. It also takes into account that different teams use different methods and tools and that "one size doesn't fit all"!
Skilled teams find important bugs faster, discover innovative solutions to hard
testing problems and know how to communicate their value. Sometimes, a few super
testers can replace an army of average testers.
While I consider myself to be a "super tester" with thousands of bugs reported there is a very important note to make here. Communities of Practice are successful when their members are self-focused on skill development! In my view and to some extent the communities I've worked with everyone should strive to constantly improve their skills but also exercise peer pressure on their co-workers to not fall behind. This has been confirmed by other folks in the QA industry and I've heard it many times when talking to friends from other companies.
Thanks for reading and happy testing!
There are comments.
At GTAC 2015 Laura Inozemtseva made a lightning talk titled Coverage is Not Strongly Correlated with Test Suite Effectiveness which is the single event that got me hooked up with mutation testing. This year, at GTAC 2016, Rahul Gopinath made a counter argument with his lightning talk Code Coverage is a Strong Predictor of Test Suite Effectiveness. So which one is better ? I urge you to watch both talks and take notes before reading about my practical experiment and other opinions on the topic!
DISCLAIMER: I'm a heavy contributor to Cosmic-Ray, the mutation testing tool for Python so my view is biased!
Both Laura and Rahul (you will too) agree that a test suite effectiveness depends on the strength of its oracles. In other words the assertions you make in your tests. This is what makes a test suite good and determines its ability to detect bugs when present. I've decided to use pelican-ab as a practical example. pelican-ab is a plugin for Pelican, the static site generator for Python. It allows you to generate A/B experiments by writing out the content into different directories and adjusting URL paths based on the experiment name.
Absolutely NOT! In version 0.2.1, commit ef1e211, pelican-ab had the following bug:
Given: Pelican's DELETE_OUTPUT_DIRECTORY is set to True (which it is by default)
When: we generate several experiments using the commands:
AB_EXPERIMENT="control" make regenerate
AB_EXPERIMENT="123" make regenerate
AB_EXPERIMENT="xy" make regenerate
make publish
Actual result: only the "xy" experiment (the last one) would be published online.
And: all of the other contents will be deleted.
Expected result: content from all experiments will be available under the output directory.
This is because before each invocation Pelican deletes the output directory and re-creates the entire content structure. The bug was not caught regardless of having 100% line + branch coverage. See Build #10 for more info.
So I've branched off since commit ef1e211 into the mutation_testing_vs_coverage_experiment branch (requires Pelican==3.6.3).
After initial execution of Cosmic Ray I have 2 mutants left:
$ cosmic-ray run --baseline=10 --test-runner=unittest example.json pelican_ab -- tests/
$ cosmic-ray report example.json
job ID 29:Outcome.SURVIVED:pelican_ab
command: cosmic-ray worker pelican_ab mutate_comparison_operator 3 unittest -- tests/
--- mutation diff ---
--- a/home/senko/pelican-ab/pelican_ab/__init__.py
+++ b/home/senko/pelican-ab/pelican_ab/__init__.py
@@ -14,7 +14,7 @@
def __init__(self, output_path, settings=None):
super(self.__class__, self).__init__(output_path, settings)
experiment = os.environ.get(jinja_ab._ENV, jinja_ab._ENV_DEFAULT)
- if (experiment != jinja_ab._ENV_DEFAULT):
+ if (experiment > jinja_ab._ENV_DEFAULT):
self.output_path = os.path.join(self.output_path, experiment)
Content.url = property((lambda s: ((experiment + '/') + _orig_content_url.fget(s))))
URLWrapper.url = property((lambda s: ((experiment + '/') + _orig_urlwrapper_url.fget(s))))
job ID 33:Outcome.SURVIVED:pelican_ab
command: cosmic-ray worker pelican_ab mutate_comparison_operator 7 unittest -- tests/
--- mutation diff ---
--- a/home/senko/pelican-ab/pelican_ab/__init__.py
+++ b/home/senko/pelican-ab/pelican_ab/__init__.py
@@ -14,7 +14,7 @@
def __init__(self, output_path, settings=None):
super(self.__class__, self).__init__(output_path, settings)
experiment = os.environ.get(jinja_ab._ENV, jinja_ab._ENV_DEFAULT)
- if (experiment != jinja_ab._ENV_DEFAULT):
+ if (experiment not in jinja_ab._ENV_DEFAULT):
self.output_path = os.path.join(self.output_path, experiment)
Content.url = property((lambda s: ((experiment + '/') + _orig_content_url.fget(s))))
URLWrapper.url = property((lambda s: ((experiment + '/') + _orig_urlwrapper_url.fget(s))))
total jobs: 33
complete: 33 (100.00%)
survival rate: 6.06%
The last one, job 33 is equivalent mutation. The first one, job 29 is killed by the test added in commit b8bff85. For all practical purposes we now have 100% code coverage and 100% mutation coverage. The bug described above still exists thought.
The bug isn't detected by any test because we don't have tests designed to perform and validate the exact same steps that a physical person will execute when using pelican-ab. Such test is added in commit ca85bd0 and you can see that it causes Build #22 to fail.
Experiment with setting DELETE_OUTPUT_DIRECTORY=False
in tests/pelicanconf.py
and
the test will PASS!
Not of course. Even after 100% code and mutation coverage and after manually constructing
a test which mimics user behavior there is at least one more bug present. There
is a pylint bad-super-call
error, fixed in
commit 193e3db.
For more information about the error see
this blog post.
During my humble experience with mutation testing so far I've added quite a few new tests and discovered two bugs which went unnoticed for years. The first one is constructor parameter not passed to parent constructor, see PR#96, pykickstart/commands/authconfig.py
def __init__(self, writePriority=0, *args, **kwargs):
- KickstartCommand.__init__(self, *args, **kwargs)
+ KickstartCommand.__init__(self, writePriority, *args, **kwargs)
self.authconfig = kwargs.get("authconfig", "")
The second bug is parameter being passed to parent class constructor, but the parent class doesn't care about this parameter. For example PR#96, pykickstart/commands/driverdisk.py
- def __init__(self, writePriority=0, *args, **kwargs):
- BaseData.__init__(self, writePriority, *args, **kwargs)
+ def __init__(self, *args, **kwargs):
+ BaseData.__init__(self, *args, **kwargs)
Also note that pykickstart has nearly 100% test coverage as a whole and the affected files were 100% covered as well.
The bugs above don't seem like a big deal and when considered out of context are relatively minor. However pykickstart's biggest client is anaconda, the Fedora and Red Hat Enterprise Linux installation program. Anaconda uses pykickstart to parse and generate text files (called kickstart files) which contain information for driving the installation in a fully automated manner. This is used by everyone who installs Linux on a large scale and is pretty important functionality!
writePriority
controls the order of which individual commands are written to file
at the end of the installation. In rare cases the order of commands may depend
on each other. Now imagine the bugs above produce a disordered kickstart file,
which a system administrator thinks should work but it doesn't. It may be the case
this administrator is trying to provision hundreds of Linux systems to bootstrap
a new data center or maybe performing disaster recovery. You get the scale of
the problem now, don't you?
To be honest I've seen bugs of this nature but not in the last several years.
This is all to say a minor change like this may have an unexpectedly big impact somewhere down the line.
With respect to the above findings and my bias I'll say the following:
As a bonus to this article let me share a transcript from the mutation-testing.slack.com community:
atodorov 2:28 PM
Hello everyone, I'd like to kick-off a discussion / interested in what you think about
Rahul Gopinath's talk at GTAC this year. What he argues is that test coverage is still
the best metric for how good a test suite is and that mutation coverage doesn't add much
additional value. His talk is basically the opposite of what @lminozem presented last year
at GTAC. Obviously the community here and especially tools authors will have an opinion on
these two presentations.
tjchambers 12:37 AM
@atodorov I have had the "pleasure" of working on a couple projects lately that illustrate
why LOC test coverage is a misnomer. I am a **strong** proponent of mutation testing so will
declare my bias.
The projects I have worked on have had a mix of test coverage - one about 50% and
another > 90%.
In both cases however there was a significant difference IMO relative to mutation coverage
(which I have more faith in as representative of true tested code).
Critical factors I see when I look at the difference:
- Line length: in both projects the line lengths FAR exceeded visible line lengths that are
"acceptable". Many LONGER lines had inline conditionals at the end, or had ternary operators
and therefore were in fact only 50% or not at all covered, but were "traversed"
- Code Conviction (my term): Most of the code in these projects (Rails applications) had
significant Hash references all of which were declared in "traditional" format hhh[:symbol].
So it was nearly impossible for the code in execution to confirm the expectation of the
existence of a hash entry as would be the case with stronger code such as "hhh.fetch(:symbol)"
- Instance variables abound: As with most of Rails code the number of instance variables
in a controller are extreme. This pattern of reference leaked into all other code as well,
making it nearly impossible with the complex code flow to ascertain proper reference
patterns that ensured the use of the instance variables, so there were numerous cases
of instance variable typos that went unnoticed for years. (edited)
- .save and .update: yes again a Rails issue, but use of these "weak" operations showed again
that although they were traversed, in many cases those method references could be removed
during mutation and the tests would still pass - a clear indication that save or update was
silently failing.
I could go on and on, but the mere traversal of a line of code in Ruby is far from an indication
of anything more than it may be "typed in correctly".
@atodorov Hope that helps.
LOC test coverage is a place to begin - NOT a place to end.
atodorov 1:01 AM
@tjchambers: thanks for your answer. It's too late for me here to read it carefully but
I'll do it tomorrow and ping you back
dkubb 1:13 AM
As a practice mutation testing is less widely used. The tooling is still maturing. Depending on your
language and environment you might have widely different experiences with mutation testing
I have not watched the video, but it is conceivable that someone could try out mutation testing tools
for their language and conclude it doesn’t add very much
mbj 1:14 AM
Yeah, I recall talking with @lminozem here and we identified that the tools she used likely
show high rates of false positives / false coverage (as the tools likely do not protect against
certain types of integration errors)
dkubb 1:15 AM
IME, having done TDD for about 15+ years or so, and mutation testing for about 6 years, I think
when it is done well it can be far superior to using line coverage as a measurement of test quality
mbj 1:16 AM
Any talk pro/against mutation testing must, as the tool basis is not very homogeneous, show a non consistent result.
dkubb 1:16 AM
Like @tjchambers says though, if you have really poor line coverage you’re not going to
get as much of a benefit from mutation testing, since it’s going to be telling you what
you already know — that your project is poorly tested and lots of code is never exercised
mbj 1:19 AM
Thats a good and likely the core point. I consider that mutation testing only makes sense
when aiming for 100% (and this is to my experience not impractical).
tjchambers 1:20 AM
I don't discount the fact that tool quality in any endeavor can bring pro/con judgements
based on particular outcomes
dkubb 1:20 AM
What is really interesting for people is to get to 100% line coverage, and then try mutation
testing. You think you’ve done a good job, but I guarantee mutation testing will find dozens
if not hundreds of untested cases .. even in something with 100% line coverage
To properly evaluate mutation testing, I think this process is required, because you can’t
truly understand how little line coverage gives you in comparison
tjchambers 1:22 AM
But I don't need a tool to tell me that a 250 character line of conditional code that by
itself would be an oversized method AND counts more because there are fewer lines in the
overall percentage contributes to a very foggy sense of coverage.
dkubb 1:22 AM
It would not be unusual for something with 100% line coverage to undergo mutation testing
and actually find out that the tests only kill 60-70% of possible mutations
tjchambers 1:22 AM
@dkubb or less
dkubb 1:23 AM
usually much less :stuck_out_tongue:
it can be really humbling
mbj 1:23 AM
In this discussion you miss that many test suites (unless you have noop detection):
Will show false coverage.
tjchambers 1:23 AM
When I started with mutant on my own project which I developed I had 95% LOC coverage
mbj 1:23 AM
Test suites need to be fixed to comply to mutation testing invariants.
tjchambers 1:23 AM
I had 34% mutation coverage
And that was ignoring the 5% that wasn't covered at all
mbj 1:24 AM
Also if the tool you compare MT with line coverage on: Is not very strong,
the improvement may not be visible.
dkubb 1:24 AM
another nice benefit is that you will become much better at enumerating all
the things you need to do when writing tests
tjchambers 1:24 AM
@dkubb or better yet - when writing code.
The way I look at it - the fewer the alive mutations the better the test,
the fewer the mutations the better the code.
dkubb 1:29 AM
yeah, you can infer a kind of cyclomatic complexity by looking at how many mutations there are
tjchambers 1:31 AM
Even without tests (not recommended) you can judge a lot from the mutations themselves.
I still am an advocate for mutations/LOC metric
As you can see members in the community are strong supporters of mutation testing, all of them having much more experience than I do.
Please help me collect more practical examples! My goal is to collect enough information and present the findings at GTAC 2017 which will be held in London.
UPDATE: I have written Mutation testing vs. coverage, Pt.2 with another example.
Thanks for reading and happy testing!
There are comments.