Tag Django

Automatic Upstream Dependency Testing

Ever since RHEL 7.2 python-libs broke s3cmd I've been pondering an age old problem: How do I know if my software works with the latest upstream dependencies ? How can I pro-actively monitor for new versions and add them to my test matrix ?

Mixing together my previous experience with Difio and monitoring upstream sources, and Forbes Lindesay's GitHub Automation talk at DEVit Conf I came together with a plan:

  • Make an application which will execute when new upstream version is available;
  • Automatically update .travis.yml for the projects I'm interested in;
  • Let Travis-CI execute my test suite for all available upstream versions;
  • Profit!

How Does It Work

First we need to monitor upstream! RubyGems.org has nice webhooks interface, you can even trigger on individual packages. PyPI however doesn't have anything like this :(. My solution is to run a cron job every hour and parse their RSS stream for newly released packages. This has been working previously for Difio so I re-used one function from the code.

After finding anything we're interested in comes the hard part - automatically updating .travis.yml using the GitHub API. I've described this in more detail here. This time I've slightly modified the code to update only when needed and accept more parameters so it can be reused.

Travis-CI has a clean interface to specify environment variables and defining several of them crates a test matrix. This is exactly what I'm doing. .travis.yml is updated with a new ENV setting, which determines the upstream package version. After commit new build is triggered which includes the expanded test matrix.

Example

Travis-CI build log

Imagine that our Project 2501 depends on FOO version 0.3.1. The build log illustrates what happened:

  • Build #9 is what we've tested with FOO-0.3.1 and released to production. Test result is PASS!
  • Build #10 - meanwhile upstream releases FOO-0.3.2 which causes our project to break. We're not aware of this and continue developing new features while all test results still PASS! When our customers upgrade their systems Project 2501 will break ! Tests didn't catch it because test matrix wasn't updated. Please ignore the actual commit message in the example! I've used the same repository for the dummy dependency package.
  • Build #11 - the monitoring solution finds FOO-0.3.2 and updates the test matrix automatically. The build immediately breaks! More precisely the test with version 0.3.2 fails!
  • Build #12 - we've alerted FOO.org about their problem and they've released FOO-0.3.3. Our monitor has found that and updated the test matrix. However the 0.3.2 test job still fails!
  • Build #13 - we decide to workaround the 0.3.2 failure or simply handle the error gracefully. In this example I've removed version 0.3.2 from the test matrix to simulate that. In reality I wouldn't touch .travis.yml but instead update my application and tests to check for that particular version. All test results are PASS again!

Btw Build #11 above was triggered manually (./monitor.py) while Build #12 came from OpenShit, my hosting environment.

At present I have this monitoring enabled for my new Markdown extensions and will also add it to django-s3-cache once it migrates to Travis-CI (it uses drone.io now).

Enough Talk, Show me the Code

monitor.py
#!/usr/bin/env python

import os
import sys
import json
import base64
import httplib
from pprint import pprint
from datetime import datetime
from xml.dom.minidom import parseString

def get_url(url, post_data = None):
    # GitHub requires a valid UA string
    headers = {
        'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120601 Firefox/10.0.5',
    }

    # shortcut for GitHub API calls
    if url.find("://") == -1:
        url = "https://api.github.com%s" % url

    if url.find('api.github.com') > -1:
        if not os.environ.has_key("GITHUB_TOKEN"):
            raise Exception("Set the GITHUB_TOKEN variable")
        else:
            headers.update({
                'Authorization': 'token %s' % os.environ['GITHUB_TOKEN']
            })

    (proto, host_path) = url.split('//')
    (host_port, path) = host_path.split('/', 1)
    path = '/' + path

    if url.startswith('https'):
        conn = httplib.HTTPSConnection(host_port)
    else:
        conn = httplib.HTTPConnection(host_port)


    method = 'GET'
    if post_data:
        method = 'POST'
        post_data = json.dumps(post_data)

    conn.request(method, path, body=post_data, headers=headers)
    response = conn.getresponse()

    if (response.status == 404):
        raise Exception("404 - %s not found" % url)

    result = response.read().decode('UTF-8', 'replace')
    try:
        return json.loads(result)
    except ValueError:
        # not a JSON response
        return result

def post_url(url, data):
    return get_url(url, data)


def monitor_rss(config):
    """
        Scan the PyPI RSS feeds to look for new packages.
        If name is found in config then execute the specified callback.

        @config is a dict with keys matching package names and values
        are lists of dicts
            {
                'cb' : a_callback,
                'args' : dict
            }
    """
    rss = get_url("https://pypi.python.org/pypi?:action=rss")
    dom = parseString(rss)
    for item in dom.getElementsByTagName("item"):
        try:
            title = item.getElementsByTagName("title")[0]
            pub_date = item.getElementsByTagName("pubDate")[0]

            (name, version) = title.firstChild.wholeText.split(" ")
            released_on = datetime.strptime(pub_date.firstChild.wholeText, '%d %b %Y %H:%M:%S GMT')

            if name in config.keys():
                print name, version, "found in config"
                for cfg in config[name]:
                    try:
                        args = cfg['args']
                        args.update({
                            'name' : name,
                            'version' : version,
                            'released_on' : released_on
                        })

                        # execute the call back
                        cfg['cb'](**args)
                    except Exception, e:
                        print e
                        continue
        except Exception, e:
            print e
            continue

def update_travis(data, new_version):
    travis = data.rstrip()
    new_ver_line = "  - VERSION=%s" % new_version
    if travis.find(new_ver_line) == -1:
        travis += "\n" + new_ver_line + "\n"
    return travis


def update_github(**kwargs):
    """
        Update GitHub via API
    """
    GITHUB_REPO = kwargs.get('GITHUB_REPO')
    GITHUB_BRANCH = kwargs.get('GITHUB_BRANCH')
    GITHUB_FILE = kwargs.get('GITHUB_FILE')

    # step 1: Get a reference to HEAD
    data = get_url("/repos/%s/git/refs/heads/%s" % (GITHUB_REPO, GITHUB_BRANCH))
    HEAD = {
        'sha' : data['object']['sha'],
        'url' : data['object']['url'],
    }

    # step 2: Grab the commit that HEAD points to
    data = get_url(HEAD['url'])
    # remove what we don't need for clarity
    for key in data.keys():
        if key not in ['sha', 'tree']:
            del data[key]
    HEAD['commit'] = data

    # step 4: Get a hold of the tree that the commit points to
    data = get_url(HEAD['commit']['tree']['url'])
    HEAD['tree'] = { 'sha' : data['sha'] }

    # intermediate step: get the latest content from GitHub and make an updated version
    for obj in data['tree']:
        if obj['path'] == GITHUB_FILE:
            data = get_url(obj['url']) # get the blob from the tree
            data = base64.b64decode(data['content'])
            break

    old_travis = data.rstrip()
    new_travis = update_travis(old_travis, kwargs.get('version'))

    # bail out if nothing changed
    if new_travis == old_travis:
        print "new == old, bailing out", kwargs
        return

    ####
    #### WARNING WRITE OPERATIONS BELOW
    ####

    # step 3: Post your new file to the server
    data = post_url(
                "/repos/%s/git/blobs" % GITHUB_REPO,
                {
                    'content' : new_travis,
                    'encoding' : 'utf-8'
                }
            )
    HEAD['UPDATE'] = { 'sha' : data['sha'] }

    # step 5: Create a tree containing your new file
    data = post_url(
                "/repos/%s/git/trees" % GITHUB_REPO,
                {
                    "base_tree": HEAD['tree']['sha'],
                    "tree": [{
                        "path": GITHUB_FILE,
                        "mode": "100644",
                        "type": "blob",
                        "sha": HEAD['UPDATE']['sha']
                    }]
                }
            )
    HEAD['UPDATE']['tree'] = { 'sha' : data['sha'] }

    # step 6: Create a new commit
    data = post_url(
                "/repos/%s/git/commits" % GITHUB_REPO,
                {
                    "message": "New upstream dependency found! Auto update .travis.yml",
                    "parents": [HEAD['commit']['sha']],
                    "tree": HEAD['UPDATE']['tree']['sha']
                }
            )
    HEAD['UPDATE']['commit'] = { 'sha' : data['sha'] }

    # step 7: Update HEAD, but don't force it!
    data = post_url(
                "/repos/%s/git/refs/heads/%s" % (GITHUB_REPO, GITHUB_BRANCH),
                {
                    "sha": HEAD['UPDATE']['commit']['sha']
                }
            )

    if data.has_key('object'): # PASS
        pass
    else: # FAIL
        print data['message']


if __name__ == "__main__":
    config = {
        "atodorov-test" : [
            {
                'cb' : update_github,
                'args': {
                    'GITHUB_REPO' : 'atodorov/bztest',
                    'GITHUB_BRANCH' : 'master',
                    'GITHUB_FILE' : '.travis.yml'
                }
            }
        ],
        "Markdown" : [
            {
                'cb' : update_github,
                'args': {
                    'GITHUB_REPO' : 'atodorov/Markdown-Bugzilla-Extension',
                    'GITHUB_BRANCH' : 'master',
                    'GITHUB_FILE' : '.travis.yml'
                }
            },
            {
                'cb' : update_github,
                'args': {
                    'GITHUB_REPO' :  'atodorov/Markdown-No-Lazy-Code-Extension',
                    'GITHUB_BRANCH' : 'master',
                    'GITHUB_FILE' : '.travis.yml'
                }
            },
            {
                'cb' : update_github,
                'args': {
                    'GITHUB_REPO' :  'atodorov/Markdown-No-Lazy-BlockQuote-Extension',
                    'GITHUB_BRANCH' : 'master',
                    'GITHUB_FILE' : '.travis.yml'
                }
            },
        ],
    }

    # check the RSS to see if we have something new
    monitor_rss(config)

There are comments.

Commit a file with the GitHub API and Python

How do you commit changes to a file using the GitHub API ? I've found this post by Levi Botelho which explains the necessary steps but without any code. So I've used it and created a Python example.

I've rearranged the steps so that all write operations follow after a certain section in the code and also added an intermediate section which creates the updated content based on what is available in the repository.

I'm just appending versions of Markdown to the .travis.yml (I will explain why in my next post) and this is hard-coded for the sake of example. All content related operations are also based on the GitHub API because I want to be independent of the source code being around when I push this script to a hosting provider.

I've tested this script against itself. In the commits log you can find the Automatic update to Markdown-X.Y messages. These are from the script. Also notice the Merge remote-tracking branch 'origin/master' messages, these appeared when I pulled to my local copy. I believe the reason for this is that I have some dangling trees and/or commits from the time I was still experimenting with a broken script. I've tested on another clean repository and there are no such merges.

IMPORTANT

For this to work you need to properly authenticate with GitHub. I've crated a new token at https://github.com/settings/tokens with the public_repo permission and that works for me.

There are comments.

Blog Migration: from Octopress to Pelican

Finally I have migrated this blog from Octopress to Pelican. I am using the clean-blog theme with modifications.

See the pelican_migration branch for technical details. Here's the summary:

  • I removed pretty much everything that Octopress uses, only left the content files;
  • I've added my own CSS overrides;
  • I had several Octopress pages, these were merged and converted into blog posts;
  • In Octopress all titles had quotes, which were removed using sed;
  • Octopress categories were converted to Pelican tags and removed quotes around them, again using sed;
  • Manually updated Octopress's {% codeblock %} and {% blockquote %} tags to match Pelican syntax. This is the biggest content change;
  • I was trying to keep as much as the original URLs as possible. ARTICLE_URL, ARTICLE_SAVE_AS, TAG_URL, TAG_SAVE_AS, FEED_ALL_ATOM and TAG_FEED_ATOM are the relevant settings. For 50+ posts I had to manually specify the Slug: variable so that they match existing Octopress URLs. Verifying the resulting names was as simple as diffing the file listings from both Octopress and Pelican. NOTE: The fedora.planet tag changed its URL because there's no way to assign slugs for tags in Pelican. The new URL is missing the dot! Luckily I make use of this only in one place which was manually updated!

I've also found a few bugs and missing functionality:

  • There's no rake new_post counterpart in Pelican. See Issue 1410 and commit 6552f6f. Thanks Kevin Decherf;
  • As far as I can tell the preview server doesn't regenerate files automatically. Do make regenerate and make serve in two separate shells. Thanks Kevin Decherf;
  • Pelican will merge code blocks and quotes which follow one after another but are separated with a newline in Markdown. This makes it visually impossible to distinguish code from two files, or quotes from two people, which are published without any other comments in between. See Markdown-No-Lazy-BlockQuote-Extension and Markdown-No-Lazy-Code-Extension;
  • The syntax doesn't allow to specify filename or a quote title when publishing code blocks and quotes. Octopress did that easily. I will be happy with something like :::python settings.py. See PR #445;
  • There's no way to specify slugs for tag URLs in order to keep compatibility with existing URLs, see Issue 1873.

I will be filling Issues and pull requests for both Pelican and the clear-blog theme in the next few days so stay tuned!

UPDATED 2015-11-26: added links to issues, pull requests and custom extensions.

There are comments.

Call for Ideas: Graphical Test Coverage Reports

If you are working with Python and writing unit tests chances are you are familiar with the coverage reporting tool. However there are testing scenarios in which we either don't use unit tests or maybe execute different code paths(test cases) independent of each other.

For example, this is the case with installation testing in Fedora. Because anaconda - the installer is very complex the easiest way is to test it live, not with unit tests. Even though we can get a coverage report (anaconda is written in Python) it reflects only the test case it was collected from.

coverage combine can be used to combine several data files and produce an aggregate report. This can tell you how much test coverage you have across all your tests.

As far as I can tell Python's coverage doesn't tell you how many times a particular line of code has been executed. It also doesn't tell you which test cases executed a particular line (see PR #59). In the Fedora example, I have the feeling many of our tests are touching the same code base and not contributing that much to the overall test coverage. So I started working on these items.

I imagine a script which will read coverage data from several test executions (preferably in JSON format, PR #60) and produce a graphical report similar to what GitHub does for your commit activity.

See an example here!

The example uses darker colors to indicate more line executions, lighter for less executions. Check the HTML for the actual numbers b/c there are no hints yet. The input JSON files are here and the script to generate the above HTML is at GitHub.

Now I need your ideas and comments!

What kinds of coverage reports are you using in your job ? How do you generate them ? How do they look like ?

There are comments.

DEVit Conf 2015 Impressions

It's been a busy week after DEVit conf took place in Thessaloniki. Here are my impressions.

Crack, Train, Fix, Release

Sessions

I've started the day with the session called "Crack, Train, Fix, Release" by Chris Heilmann. While it was very interesting for some unknown reason I was expecting a talk more closely related to software testing. Unfortunately at the same time in the other room was a talk called "Integration Testing from the Trenches" by Nicolas Frankel which I missed.

At the end Chris answered the question "What to do about old versions of IE ?". And the answer pretty much was "Don't try to support everything, leave them with basic functionality so that users can achieve what they came for on your website. Don't put nice buttons b/c IE 6 users are not used to nice things and they get confused."

If you remember I had a similar question to Jeremy Keith at Bulgaria Web Summit last month and the answer was similar:

Q: Which one is Jeremy's favorite device/browser to develop for.

A: Your approach is wrong and instead we should be thinking in terms of what features are essential or non-essential for our websites and develop around features (if supported, if not supported) not around browsers!

Btw I did ask Chris if he knows Jeremy and he does.

After the coffee break there was "JavaScript ♥ Unicode" by Mathias Bynens which I saw last year at How Camp in Veliko Tarnovo so I just stopped by to say hi and went to listen to "The future of responsive web design: web component queries" by Nikos Zinas. As far as I understood Nikos is a local rock-star developer. I'm not much into web development but the opportunity to create your own HTML components (tags) looks very promising. I guess there will be more business coming for Telerik :).

I wanted to listen to "Live Productive Coder" by Heinz Kabutz but that one started in Greek so I switched the room for "iOS real time content modifications using websockets" by Benny Weingarten-Gabbay.

After lunch I went straight for "Introduction to Docker: What is it and why should I care?" by Ian Miell which IMO was the most interesting talk of the day. It wasn't very technical but managed to clear some of the mysticism around Docker and what it actually is. I tried to grab a few minutes of Ian's time and we found topics of common interest to talk about (Project Atomic anyone?) but later failed to find him and continue the talk. I guess I'll have to follow online.

Tim Perry with "Your Web Stack Would Betray You In An Instant" made a great show. The room was packed, I myself was actually standing the whole time. He described a series of failures across the entire web development stack which gave developers hard times patching and upgrading their services. The lesson: everything fails, be prepared!

The last talk I visited was "GitHub Automation" by Forbes Lindesay. It was more of an inspirational talk, rather than technical one. GitHub provides cool API so why not use it?

Organization

DEVit team

From what I know this is the first year of DEVit. For a first timer the team did great! I particularly liked the two coffee breaks before lunch and in the early afternoon and the sponsors pitches in between the main talks.

All talks were recorded but I have no idea what's happening with the videos!

I will definitely make a point of visiting Thessaloniki more often and follow the local IT and start-up scenes there. And tonight is Silicon Drinkabout which will be the official after party of DigitalK in Sofia.

There are comments.

Free Software Testing Books

There's a huge list of free books on the topic of software testing. This will definitely be my summer reading list. I hope you find it helpful.

200 Graduation Theses About Software Testing

The guys from QAHelp have compiled a list of 200 graduation theses from various universities which are freely accessible online. The list can be found here.

There are comments.

Videos from Bulgaria Web Summit 2015

We're full

Bulgaria Web Summit 2015 is over. The event was incredible and I had a lot of fun moderating the main room. We had many people coming from other countries and I've made lots of new friends. Thank you to everyone who attended!

You can find video recordings of all talks in the main room (in order of appearance) below:

Hope to see you next time in Sofia!

Mean while I learned about DEVit in Thessaloniki in May and another one in Zagreb in October. See you there :)

There are comments.

Speeding Up Celery Backends, Part 3

In the second part of this article we've seen how slow Celery actually is. Now let's explore what happens inside and see if we can't speed things up.

I've used pycallgraph to create call graph visualizations of my application. It has the nice feature to also show execution time and use different colors for fast and slow operations.

Full command line is:

pycallgraph -v --stdlib --include ... graphviz -o calls.png -- ./manage.py celery_load_test

where the --include is used to limit the graph to a particular Python module(s).

General findings

call graph

  • The first four calls is where most of the time is spent as seen on the picture.
  • As it seems most of the slow down comes from Celery itself, not the underlying messaging transport Kombu (not shown on picture)
  • celery.app.amqp.TaskProducer.publish_task takes half of the execution time of celery.app.base.Celery.send_task
  • celery.app.task.Task.delay directly executes .apply_async and can be skipped if one rewrites the code.

More findings

In celery.app.base.Celery.send_task there is this block of code:

349         with self.producer_or_acquire(producer) as P:
350             self.backend.on_task_call(P, task_id)
351             task_id = P.publish_task(
352                 name, args, kwargs, countdown=countdown, eta=eta,
353                 task_id=task_id, expires=expires,
354                 callbacks=maybe_list(link), errbacks=maybe_list(link_error),
355                 reply_to=reply_to or self.oid, **options
356             )

producer is always None because delay() doesn't pass it as argument. I've tried passing it explicitly to apply_async() as so:

from djapp.celery import *

# app = debug_task._get_app() # if not defined in djapp.celery
producer = app.amqp.producer_pool.acquire(block=True)
debug_task.apply_async(producer=producer)

However this doesn't speedup anything. If we replace the above code block like this:

349         with producer as P:

it blows up on the second iteration because producer and its channel is already None !?!

If you are unfamiliar with the with statement in Python please read this article. In short the with statement is a compact way of writing try/finally. The underlying kombu.messaging.Producer class does a self.release() on exit of the with statement.

I also tried killing the with statement and using producer directly but with limited success. While it was not released(was non None) on subsequent iterations the memory usage grew much more and there wasn't any performance boost.

Conclusion

The with statement is used throughout both Celery and Kombu and I'm not at all sure if there's a mechanism for keep-alive connections. My time constraints are limited and I'll probably not spend anymore time on this problem soon.

Since my use case involves task producer and consumers on localhost I'll try to workaround the current limitations by using Kombu directly (see this gist) with a transport that uses either a UNIX domain socket or a name pipe (FIFO) file.

There are comments.

Speeding up Celery Backends, Part 2

In the first part of this post I looked at a few celery backends and discovered they didn't meet my needs. Why is the Celery stack slow? How slow is it actually?

How slow is Celery in practice

  • Queue: 500`000 msg/sec
  • Kombu: 14`000 msg/sec
  • Celery: 2`000 msg/sec

Detailed test description

There are three main components of the Celery stack:

  • Celery itself
  • Kombu which handles the transport layer
  • Python Queue()'s underlying everything

Using the Queue and Kombu tests run for 1 000 000 messages I got the following results:

  • Raw Python Queue: Msgs per sec: 500`000
  • Raw Kombu without Celery where kombu/utils/__init__.py:uuid() is set to return 0
    • with json serializer: Msgs per sec: 5`988
    • with pickle serializer: Msgs per sec: 12`820
    • with the custom mem_serializer from part 1: Msgs per sec: 14`492

Note: when the test is executed with 100K messages mem_serializer yielded 25`000 msg/sec then the performance is saturated. I've observed similar behavior with raw Python Queue()'s. I saw some cache buffers being managed internally to avoid OOM exceptions. This is probably the main reason performance becomes saturated over a longer execution.

  • Using celery_load_test.py modified to loop 1 000 000 times I got 1908.0 tasks created per sec.

Another interesting this worth outlining - in the kombu test there are these lines:

with producers[connection].acquire(block=True) as producer:
    for j in range(1000000):

If we swap them the performance drops down to 3875 msg/sec which is comparable with the Celery results. Indeed inside Celery there's the same with producer.acquire(block=True) construct which is executed every time a new task is published. Next I will be looking into this to figure out exactly where the slowliness comes from.

There are comments.

Speeding up Celery Backends, Part 1

I'm working on an application which fires a lot of Celery tasks - the more the better! Unfortunately Celery backends seem to be rather slow :(. Using the celery_load_test.py command for Django I was able to capture some metrics:

  • Amazon SQS backend: 2 or 3 tasks/sec
  • Filesystem backend: 2000 - 2500 tasks/sec
  • Memory backend: around 3000 tasks/sec

Not bad but I need in the order of 10000 tasks created per sec! The other noticeable thing is that memory backend isn't much faster compared to the filesystem one! NB: all of these backends actually come from the kombu package.

Why is Celery slow ?

Using celery_load_test.py together with cProfile I was able to pin-point some problematic areas:

  • kombu/transports/virtual/__init__.py: class Channel.basic_publish() - does self.encode_body() into base64 encoded string. Fixed with custom transport backend I called fastmemory which redefines the body_encoding property:

    @cached_property
    def body_encoding(self):
        return None
    
  • Celery uses json or pickle (or other) serializers to serialize the data. While json yields between 2000-3000 tasks/sec, pickle does around 3500 tasks/sec. Replacing with a custom serializer which just returns the objects (since we read/write from/to memory) yields about 4000 tasks/sec tops:

    from kombu.serialization import register
    
    def loads(s):
        return s
    
    def dumps(s):
        return s
    
    register('mem_serializer', dumps, loads,
            content_type='application/x-memory',
            content_encoding='binary')
    
  • kombu/utils/__init__.py: def uuid() - generates random unique identifiers which is a slow operation. Replacing it with return "00000000" boosts performance to 7000 tasks/sec.

It's clear that a constant UUID is not of any practical use but serves well to illustrate how much does this function affect performance.

Note: Subsequent executions of celery_load_test seem to report degraded performance even with the most optimized transport backend. I'm not sure why is this. One possibility is the random UUID usage in other parts of the Celery/Kombu stack which drains entropy on the system and generating more random numbers becomes slower. If you know better please tell me!

I will be looking for a better understanding of these IDs in Celery and hope to be able to produce a faster uuid() function. Then I'll be exploring the transport stack even more in order to reach the goal of 10000 tasks/sec. If you have any suggestions or pointers please share them in the comments.

There are comments.

Tip: Collecting Emails - Webhooks for UserVoice and WordPress.com

In my practice I like to use webhooks and integrate auxiliary services with my internal processes or businesses. One of these is the collection of emails. In this short article I'll show you an example of how to collect email addresses from the comments of a WordPress.com blog and the UserVoice feedback/ticketing system.

WordPress.com

For your WordPress.com blog from the Admin Dashboard navigate to Settings -> Webhooks and add a new webhook with action comment_post and fields comment_author, comment_author_email. A simple Django view that handles the input is shown below.

@csrf_exempt
def hook_wp_comment_post(request):
    if not request.POST:
        return HttpResponse("Not a POST\n", content_type='text/plain', status=403)

    hook = request.POST.get("hook", "")

    if hook != "comment_post":
        return HttpResponse("Go away\n", content_type='text/plain', status=403)

    name = request.POST.get("comment_author", "")
    first_name = name.split(' ')[0]
    last_name = ' '.join(name.split(' ')[1:])

    details = {
        'first_name' : first_name,
        'last_name' : last_name,
        'email' : request.POST.get("comment_author_email", ""),
    }

    store_user_details(details)

    return HttpResponse("OK\n", content_type='text/plain', status=200)

UserVoice

For UserVoice navigate to Admin Dashboard -> Settings -> Integrations -> Service Hooks and add a custom web hook for the New Ticket notification. Then use a sample code like that:

@csrf_exempt
def hook_uservoice_new_ticket(request):
    if not request.POST:
        return HttpResponse("Not a POST\n", content_type='text/plain', status=403)

    data = request.POST.get("data", "")
    event = request.POST.get("event", "")

    if event != "new_ticket":
        return HttpResponse("Go away\n", content_type='text/plain', status=403)

    data = json.loads(data)

    details = {
        'email' : data['ticket']['contact']['email'],
    }

    store_user_details(details)

    return HttpResponse("OK\n", content_type='text/plain', status=200)

store_user_details() is a function which handles the email/name received in the webhook, possibly adding them to a database or anything else.

I find webhooks extremely easy to setup and develop and used them whenever they are supported by the service provider. What other services do you use webhooks for? Please share your story in the comments.

There are comments.

OpenSource.com article - 10 steps to migrate your closed software to open source

Difio is a Django based application that keeps track of packages and tells you when they change. Difio was created as closed software, then I decided to migrate it to open source ....

Read more at OpenSource.com

Btw I'm wondering if Telerik will share their experience opening up the core of their Kendo UI framework on the webinar tomorrow.

There are comments.

Howto: Django Forms with Dynamic Fields

Last week at HackFMI 3.0 one team had to display a form which presented multiple choice selection for filtering, where the filter keys are read from the database. They've solved the problem by simply building up the HTML required inside the view. I was wondering if this can be done with forms.

>>> from django import forms
>>>
>>> class MyForm(forms.Form):
...     pass
...
>>> print(MyForm())

>>> MyForm.__dict__['base_fields']['name'] = forms.CharField()
>>> MyForm.__dict__['base_fields']['age'] = forms.IntegerField()
>>> print(MyForm())
<tr><th><label for="id_name">Name:</label></th><td><input id="id_name" name="name" type="text" /></td></tr>
<tr><th><label for="id_age">Age:</label></th><td><input id="id_age" name="age" type="number" /></td></tr>
>>>
>>>
>>> POST = {'name' : 'Alex', 'age' : 0}
>>> f = MyForm(POST)
>>> print(f)
<tr><th><label for="id_name">Name:</label></th><td><input id="id_name" name="name" type="text" value="Alex" /></td></tr>
<tr><th><label for="id_age">Age:</label></th><td><input id="id_age" name="age" type="number" value="0" /></td></tr>
>>> f.is_valid()
True
>>> f.is_bound
True
>>> f.errors
{}
>>> f.cleaned_data
{'age': 0, 'name': u'Alex'}

So if we were to query all names from the database then we could build up the class by adding a BooleanField using the object primary key as the name.

>>> MyForm.__dict__['base_fields']['123'] = forms.BooleanField()
>>> print(MyForm())
<tr><th><label for="id_123">123:</label></th><td><input id="id_123" name="123" type="checkbox" /></td></tr>
>>> f = MyForm({'123' : True})
>>> f.is_valid()
True
>>> f.cleaned_data
{'123': True}

There are comments.

Spoiler: How to Open Source Existing Proprietary Code in 10 Steps

We've heard about companies opening up their proprietary software products, this is hardly news nowadays. But have you wondered what it is like to migrate production software from closed to open source? I would like to share my own experience about going open source as seen from behind the keyboard.

Difio was recently open sourced and the steps to go through were:

  • Simplify - remove everything that can be deleted
  • Create self contained modules aka re-organize the file structure
  • Separate internal from external modules
  • Refactor the existing code
  • Select license and update copyright
  • Update 3rd party dependencies to latest versions and add requirements.txt
  • Add README and verbose settings example
  • Split difio/ into its own git repository
  • Test stand alone deployments on fresh environment
  • Announce

Do you want to know more? Use the comments and tell me what exactly! I'm writing a longer version of this article so stay tuned!

There are comments.

Beware of Django default Model Field Option When Using datetime.now()

Beware if you are using code like this:

models.DateTimeField(default=datetime.now())

i.e. passing a function return value as the default option for a model field in Django. In some cases the value will be calculated once when the application starts or the module is imported and will not be updated later. The most common scenario is DateTime fields which default to now(). The correct way is to use a callable:

models.DateTimeField(default=datetime.now)

I've hit this issue on a low volume application which uses cron to collect its own metrics by calling an internal URL. The app was running as WSGI app and I wondered why I got records with duplicate dates in the DB. A more detailed (correct) example follows:

def _strftime():
    return datetime.now().strftime('%Y-%m-%d')

class Metrics(models.Model):
    key = models.IntegerField(db_index=True)
    value = models.FloatField()
    added_on = models.DateTimeField(db_index=True, default=datetime.now)
    added_on_as_text = models.CharField(max_length=16, default=_strftime)

Difio also had the same bug but didn't exhibit the problem because all objects with default date-time values were created on the backend nodes which get updated and restarted very often.

For more info read this blog. For general info on Django, please check out Django books on Amazon.

There are comments.

SMS PIN Verification with Twilio and Django

This is a quick example of SMS PIN verification using Twilio cloud services and Django which I did for a small site recently.

SMS PIN form

The page template contains the following HTML snippet

<input type="text" id="mobile_number" name="mobile_number" placeholder="111-111-1111" required>
<button class="btn" type="button" onClick="send_pin()"><i class="icon-share"></i> Get PIN</button>

and a JavaScript function utilizing jQuery:

function send_pin() {
    $.ajax({
                url: "{% url 'ajax_send_pin' %}",
                type: "POST",
                data: { mobile_number:  $("#mobile_number").val() },
            })
            .done(function(data) {
                alert("PIN sent via SMS!");
            })
            .fail(function(jqXHR, textStatus, errorThrown) {
                alert(errorThrown + ' : ' + jqXHR.responseText);
            });
}

Then create the following views in Django:

def _get_pin(length=5):
    """ Return a numeric PIN with length digits """
    return random.sample(range(10**(length-1), 10**length), 1)[0]


def _verify_pin(mobile_number, pin):
    """ Verify a PIN is correct """
    return pin == cache.get(mobile_number)


def ajax_send_pin(request):
    """ Sends SMS PIN to the specified number """
    mobile_number = request.POST.get('mobile_number', "")
    if not mobile_number:
        return HttpResponse("No mobile number", mimetype='text/plain', status=403)

    pin = _get_pin()

    # store the PIN in the cache for later verification.
    cache.set(mobile_number, pin, 24*3600) # valid for 24 hrs

    client = TwilioRestClient(settings.TWILIO_ACCOUNT_SID, settings.TWILIO_AUTH_TOKEN)
    message = client.messages.create(
                        body="%s" % pin,
                        to=mobile_number,
                        from_=settings.TWILIO_FROM_NUMBER,
                    )
    return HttpResponse("Message %s sent" % message.sid, mimetype='text/plain', status=200)

def process_order(request):
    """ Process orders made via web form and verified by SMS PIN. """
    form = OrderForm(request.POST or None)

    if form.is_valid():
        pin = int(request.POST.get("pin", "0"))
        mobile_number = request.POST.get("mobile_number", "")

        if _verify_pin(mobile_number, pin):
            form.save()
            return redirect('transaction_complete')
        else:
            messages.error(request, "Invalid PIN!")
    else:
        return render(
                    request,
                    'order.html',
                    {
                        'form': form
                    }
                )

PINs are only numeric and are stored in the CACHE instead of the database which I think is simpler and less expensive in terms of I/O operations.

There are comments.

Mocking Django AUTH_PROFILE_MODULE without a Database

Difio is a Django based service which uses a profile model to provide site-specific, per-user information. In the process of open sourcing Difio its core functionality becomes available as a Django app. The trouble is that the UserProfile model contains site-specific and proprietary data which doesn't make sense to the public nor I want to release it.

The solution is to have a MockProfile model and work with that by default while www.dif.io and other implementations override it as needed. How do you do that without creating useless table and records in the database but still have the profiles created automatically for every user?

It turns out the solution is quite simple. See my comments inside the code below.

class AbstractMockProfile(models.Model):
    """
        Any AUTH_PROFILE_MODULE model should inherit this
        and override the default methods.

        This model provides the FK to User!
    """
    user = models.ForeignKey(User, unique=True)

    def is_subscribed(self):
        """ Is this user subscribed for our newsletter? """
        return True

    class Meta:
        # no DB table created b/c model is abstract
        abstract = True

class MockProfileManager(models.Manager):
    """
        This manager creates MockProfile's on the fly without
        touching the database. It is needed by User.get_profile()
        b/c we can't have an abstract base class as AUTH_PROFILE_MODULE.
    """
    def using(self, *args, **kwargs):
        """ It doesn't matter which database we use! """
        return self

    def get(self, *args, **kwargs):
        """
            User.get_profile() calls .using(...).get(user_id__exact=X)
            so we instrument it here to return a MockProfile() with
            user_id=X parameter. Anything else may break!!!
        """
        params = {}
        for p in kwargs.keys():
            params[p.split("__")[0]] = kwargs[p]

        # this creates an object in memory. To save it to DB
        # call obj.save() which we DON'T do anyway!
        return MockProfile(params)


class MockProfile(AbstractMockProfile):
    """
        In-memory (fake) profile class used by default for
        the AUTH_PROFILE_MODULE setting.
    """
    objects = MockProfileManager()

    class Meta:
        # DB table is NOT created automatically
        # when managed = False
        managed = False

In Difio core the user profile is always used like this

profile = request.user.get_profile()
if profile.is_subscribed():
    pass

and by default

AUTH_PROFILE_MODULE = "difio.MockProfile"

Voila!

There are comments.

Skip or Render Specific Blocks from Jinja2 Templates

I wasn't able to find detailed information on how to skip rendering or only render specific blocks from Jinja2 templates so here's my solution. Hopefully you find it useful too.

With below template I want to be able to render only kernel_options block as a single line and then render the rest of the template excluding kernel_options.

base.j2
{% block kernel_options %}
console=tty0
    {% block debug %}
        debug=1
    {% endblock %}
{% endblock kernel_options %}

{% if OS_MAJOR == 5 %}
key --skip
{% endif %}

%packages
@base
{% if OS_MAJOR > 5 %}
%end
{% endif %}

To render a particular block you have to use the low level Jinja API template.blocks. This will return a dict of block rendering functions which need a Context to work with.

The second part is trickier. To remove a block we have to create an extension which will filter it out. The provided SkipBlockExtension class does exactly this.

Last but not least - if you'd like to use both together you have to disable caching in the Environment (so you get a fresh template every time), render your blocks first, configure env.skip_blocks and render the entire template without the specified blocks.

jinja2-render
#!/usr/bin/env python

import os
import sys
from jinja2.ext import Extension
from jinja2 import Environment, FileSystemLoader


class SkipBlockExtension(Extension):
    def __init__(self, environment):
        super(SkipBlockExtension, self).__init__(environment)
        environment.extend(skip_blocks=[])

    def filter_stream(self, stream):
        block_level = 0
        skip_level = 0
        in_endblock = False

        for token in stream:
            if (token.type == 'block_begin'):
                if (stream.current.value == 'block'):
                    block_level += 1
                    if (stream.look().value in self.environment.skip_blocks):
                        skip_level = block_level

            if (token.value == 'endblock' ):
                in_endblock = True

            if skip_level == 0:
                yield token

            if (token.type == 'block_end'):
                if in_endblock:
                    in_endblock = False
                    block_level -= 1

                    if skip_level == block_level+1:
                        skip_level = 0


if __name__ == "__main__":
    context = {'OS_MAJOR' : 5, 'ARCH' : 'x86_64'}

    abs_path  = os.path.abspath(sys.argv[1])
    dir_name  = os.path.dirname(abs_path)
    base_name = os.path.basename(abs_path)

    env = Environment(
                loader = FileSystemLoader(dir_name),
                extensions = [SkipBlockExtension],
                cache_size = 0, # disable cache b/c we do 2 get_template()
            )

    # first render only the block we want
    template = env.get_template(base_name)
    lines = []
    for line in template.blocks['kernel_options'](template.new_context(context)):
        lines.append(line.strip())
    print "Boot Args:", " ".join(lines)
    print "---------------------------"

    # now instruct SkipBlockExtension which blocks we don't want
    # and get a new instance of the template with these blocks removed
    env.skip_blocks.append('kernel_options')
    template = env.get_template(base_name)
    print template.render(context)
    print "---------------------------"

The above code results in the following output:

$ ./jinja2-render ./base.j2 
Boot Args: console=tty0 debug=1 
---------------------------

key --skip

%packages
@base
---------------------------

Teaser: this is part of my effort to replace SNAKE with a client side kickstart template engine for Beaker so stay tuned!

There are comments.

Django Template Tag Inheritance How-to

While working on open-sourcing Difio I needed to remove all hard-coded URL references from the templates. My solution was to essentially inherit from the standard {% url %} template tag. Here is how to do it.

Background History

Difio is not hosted on a single server. Parts of the website are static HTML, hosted on Amazon S3. Other parts are dynamic - hosted on OpenShift. It's also possible but not required at the moment to host at various PaaS providers for redundancy and simple load balancing.

As an easy fix I had hard-coded some URLs to link to the static S3 pages and others go link to my PaaS provider.

I needed a simple solution which can be extended to allow for multiple domain hosting.

The Solution

The solution I came up with is to override the standard {% url %} tag and use it everywhere in my templates. The new tag will produce absolute URLs containing the specified protocol plus domain name and view path. For this to work you have to inherit the standard URLNode class and override the render() method to include the new values.

You also need to register a tag method to utilize the new class. My approach was to use the existing url() method to do all background processing and simply casting the result object to the new class.

All code is available at https://djangosnippets.org/snippets/3013/.

To use in your templates simply add

{% load fqdn_url from fqdn_url %}
<a href="{% fqdn_url 'dashboard' %}">Dashboard</a>

There are comments.

Idempotent Django Email Sender with Amazon SQS and Memcache

Recently I wrote about my problem with duplicate Amazon SQS messages causing multiple emails for Difio. After considering several options and feedback from @Answers4AWS I wrote a small decorator to fix this.

It uses the cache backend to prevent the task from executing twice during the specified time frame. The code is available at https://djangosnippets.org/snippets/3010/.

As stated on Twitter you should use Memcache (or ElastiCache) for this. If using Amazon S3 with my django-s3-cache don't use the us-east-1 region because it is eventually consistent.

The solution is fast and simple on the development side and uses my existing cache infrastructure so it doesn't cost anything more!

There is still a race condition between marking the message as processed and the second check but nevertheless this should minimize the possibility of receiving duplicate emails to an accepted level. Only time will tell though!

There are comments.


Page 1 / 2