Tag tips

How to Deploy Python Hotfix on RedHat OpenShift Cloud

In this article I will show you how to deploy hotfix versions for Python packages on the RedHat OpenShift PaaS cloud.

Background

You are already running a Python application on your OpenShift instance. You are using some 3rd party dependencies when you find a bug in one of them. You go forward, fix the bug and submit a pull request. You don't want to wait for upstream to release a new version but rather build a hotfix package yourself and deploy to production immediately.

Solution

There are two basic approaches to solving this problem:

  1. Include the hotfix package source code in your application, i.e. add it to your git tree or;
  2. Build the hotfix separately and deploy as a dependency. Don't include it in your git tree, just add a requirement on the hotfix version.

I will talk about the later. The tricky part here is to instruct the cloud environment to use your package (including the proper location) and not upstream or their local mirror.

Python applications hosted at OpenShift don't support requirements.txt which can point to various package sources and even install packages directly from GitHub. They support setup.py which fetches packages from http://pypi.python.org but it is flexible enough to support other locations.

Building the hotfix

First of all we'd like to build a hotfix package. This will be the upstream version that we are currently using plus the patch for our critical issue:

$ wget https://pypi.python.org/packages/source/p/python-magic/python-magic-0.4.3.tar.gz
$ tar -xzvf python-magic-0.4.3.tar.gz 
$ cd python-magic-0.4.3
$ curl https://github.com/ahupp/python-magic/pull/31.patch | patch

Verify the patch has been applied correctly and then modify setup.py to increase the version string. In this case I will set it to version='0.4.3.1'.

Then build the new package using python setup.py sdist and upload it to a web server.

Deploying to OpenShift

Modify setup.py and specify the hotfix version. Because this version is not on PyPI and will not be on OpenShift's local mirror you need to provide the location where it can be found. This is done with the dependency_links parameter to setup(). Here's how it looks:

diff --git a/setup.py b/setup.py
index c6e837c..2daa2a9 100644
--- a/setup.py
+++ b/setup.py
@@ -6,5 +6,6 @@ setup(name='YourAppName',
       author='Your Name',
       author_email='example@example.com',
       url='http://www.python.org/sigs/distutils-sig/',
-      install_requires=['python-magic==0.4.3'],
+      dependency_links=['https://s3.amazonaws.com/atodorov/blog/python-magic-0.4.3.1.tar.gz'],
+      install_requires=['python-magic==0.4.3.1'],
      )

Now just git push to OpenShift and observe the console output:

remote: Processing dependencies for YourAppName==1.0
remote: Searching for python-magic==0.4.3.1
remote: Best match: python-magic 0.4.3.1
remote: Downloading https://s3.amazonaws.com/atodorov/blog/python-magic-0.4.3.1.tar.gz
remote: Processing python-magic-0.4.3.1.tar.gz
remote: Running python-magic-0.4.3.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ZRVMBg/python-magic-0.4.3.1/egg-dist-tmp-R_Nxie
remote: zip_safe flag not set; analyzing archive contents...
remote: Removing python-magic 0.4.3 from easy-install.pth file
remote: Adding python-magic 0.4.3.1 to easy-install.pth file

Congratulations! Your hotfix package has just been deployed.

This approach should work for other cloud providers and other programming languages as well. Let me know if you have any experience with that.

There are comments.

Creating RPM .spec Files From Scratch Using Vim

On a Red Hat Enterprise Linux or Fedora (or compatible) system execute

$ vim example.spec

This will create a new file with all the important sections and fields already there. The template used is /usr/share/vim/vimfiles/template.spec and is part of the vim-common RPM package.

This is very useful trick which I didn't know. Until now I always used the spec files from previously built packages when creating new RPMs. This wasn't as fast as creating a template and filling in the blanks.

For a detailed description about recommended RPM build practices see the Fedora Packaging Guidelines.

There are comments.

Using Django built-in template tags and filters in code

In case you are wondering how to use Django's built-in template tags and filters in your source code, not inside a template here is how:

>>> from django.template.defaultfilters import *
>>> filesizeformat(1024)
u'1.0 KB'
>>> filesizeformat(1020)
u'1020 bytes'
>>> filesizeformat(102412354)
u'97.7 MB'
>>>

All built-ins live in pythonX.Y/site-packages/django/template/defaultfilters.py.

There are comments.

Tip: Renaming Model Fields in Django

Did you ever have to re-purpose a column in your database schema? Here's a quick and easy way to do this if you happen to be using Django.

Scenario

I had an integer field in my model called lines which counted the lines of code in a particular tar.gz package. I figured the file size is a better indicator so decided to start using it. I was not planning to use the old field anymore and I didn't care about the data it was holding. So I decided to re-purpose it as the size field.

Possible methods

Looking around I figured several different ways to do this:

  1. Continue using the existing lines field and keep referencing the old name in the code. This is no-brainer but feels awkward and is a disaster waiting to happen;
  2. Add new size field and remove the old lines field. This involves modification to the DB schema and requires at least a backup with possible down time. Not something I will jump at right away;
  3. Add a size property in the model class which will persist to self.lines. This is a quick way to go but I'm not sure if one can use the property with the Django QuerySet API (objects.filter(), objects.update(), etc.) I suspect not. If you don't filter by the property or use it in bulk operations this method is fine though;
  4. Change the field name to size but continue to use the lines DB column; Mind my wording here :);
  5. Rename the column in the DB schema and then update the model class field.

How I did it

I decided to go for option 4 above: change the field name to size but continue to use the lines DB column.

diff --git a/models.py b/models.py
index e06d2b2..18cad6f 100644
--- a/models.py
+++ b/models.py
@@ -667,7 +667,7 @@ class Package(models.Model):
-    lines = models.IntegerField(default=None, null=True, blank=True)
+    size  = models.IntegerField(default=None, null=True, blank=True, db_column='lines')
  1. Removed all references to lines from the code except the model class. This served as clean up as well.
  2. Renamed the model field to size but continued using the lines DB column as shown above. Django's db_column option makes this possible.
  3. From the Django shell (./manage.py shell) reset size to None (NULL) for all objects;
  4. Finally implement my new code and functionality behind the size field.

The entire process happened for under 10 minutes. I will also opt for renaming the DB column at a later time. This is to sync the naming used in Python code and in MySQL in case I ever need to use raw SQL or anything but Django.

If you were me, how would you do this? Please share in the comments below.

There are comments.

Django QuerySet tip - Search and Order By Exact Match

How do you order Django QuerySet results so that first item is the exact match if using contains or icontains ? Both solutions were proposed on the django-users mailing list.

Solution by Tom Evans, example is mine:

>>> from django.db.models import Q
>>> Package.objects.filter(
        Q(name=Django) | Q(name__icontains=Django)
    ).extra(
        select={'match' : 'name = "Django"'}
    ).order_by('-match', 'name')
[<Package: Django>, <Package: appomatic_django_cms>, <Package: appomatic_django_filer>,
<Package: appomatic_django_vcs>, <Package: BabelDjango>, <Package: BDD4Django>,
<Package: blanc-django-admin-skin>, <Package: bootstrap-django-forms>,
<Package: capistrano-django>, <Package: ccnmtldjango>, <Package: collective.django>,
<Package: csdjango.contactform>, <Package: cykooz.djangopaste>,
<Package: cykooz.djangorecipe>, <Package: d51.django.virtualenv.test_runner>,
<Package: django-4store>, <Package: django-503>, <Package: django-absolute>,
<Package: django-abstract-templates>, <Package: django-account>,
'...(remaining elements truncated)...']
>>>

Another one:

I'm not sure this is the right way, but you could drop the Q objects, use only icontains and sort by the length of 'name'

Gabriel https://groups.google.com/d/topic/django-users/OCNmIXrRgag/discussion

>>> packages = [p.name for p in Package.objects.filter(name__icontains='Dancer')]
>>> sorted(packages, key=len)
[u'Dancer', u'Dancer2', u'breakdancer', u'Task::Dancer', u'App::Dancer2', u'Dancer::Routes',
u'DancerX::Routes', u'DancerX::Config', u'Task::DWIM::Dancer', u'Dancer::Plugin::CDN',
u'Dancer::Plugin::Feed', u'Dancer::Plugin::LDAP', u'Dancer::Plugin::Lucy', 
'...(remaining elements truncated)...']
>>>

That's all folks. If you have other more interesting sorting needs please comment below. Thanks!

There are comments.

Virtualization Platforms Supported by Red Hat Enterprise Linux

This is mostly for my own reference, to have a handy list of supported virtualization platforms by Red Hat Enterprise Linux.

Software virtualization solutions

A guest RHEL operating system is supported if it runs on the following platforms:

  • Xen shipped with RHEL Server
  • KVM shipped with RHEL Server or RHEV for Servers
  • VMware ESX/vSphere
  • Microsoft Hyper-V

Red Hat does not support Citrix XenServer. However, customers can buy RHEL Server and use it with Citrix XenServer with the understanding that Red Hat will only support technical issues that can be reproduced on bare metal.

The official virtualization support matrix shows which host/guest operating systems combinations are supported.

Hardware partitioning

Red Hat supports RHEL on hardware partitioning and virtualization solutions such as:

Unfortunately the recently updated hardware catalog doesn't allow to filter by hardware partitioning vs. virtualization platform so you need to know what you are looking for to find it :(.

Red Hat Enterprise Linux as a guest on the Cloud

Multiple public cloud providers are supported. Comprehensive list can be found here: http://www.redhat.com/solutions/cloud-computing/public-cloud/find-partner/

You can also try Red Hat Partner Locator's advanced search. However at the time of this writing there are no partners listed in the Cloud / Virtualization category.

Warning: It is known that Amazon uses Xen with custom modifications (not sure what version) and HP Cloud uses KVM but there is not much public record about hypervisor technology used by most cloud providers. Red Hat has partner agreements with these vendors and will commercially support only their platforms. This means that if you decide to use upstream Xen or anything else not listed above, you are on your own. You have been warned!

Unsupported but works

I'm not a big fan of running on top of unsupported environments and I don't have the need to do so. I've heard about people running CentOS (RHEL compatible) on VirtualBox but I have no idea how well it works.

If you are using a different virtualization platform (like LXC, OpenVZ, UML, Parallels or other) let me know if CentOS/Fedora works on it. Alternatively I can give it a try if you can provide me with ssh/VNC access to the machine.

There are comments.

django-social-auth tip: Reminder of Login Provider

Every now and then users forget their passwords. This is why I prefer using OAuth and social network accounts like GitHub or Twitter. But what do you do when somebody forgets which OAuth provider they used to login to your site? Your website needs a reminder. This is how to implement one if using django-social-auth.

Back-end

Create a similar view on your Django back-end

def ajax_social_auth_provider_reminder(request):
    """
        Remind the user which social auth provider they used to login.
    """
    if not request.POST:
        return HttpResponse("Not a POST", mimetype='text/plain', status=403)

    email = request.POST.get('email', "")
    email = email.strip()
    if not email or (email.find("@") == -1):
        return HttpResponse("Invalid address!", mimetype='text/plain', status=400)

    try:
        user = User.objects.filter(email=email, is_active=True).only('pk')[0]
    except:
        return HttpResponse("No user with address '%s' found!" % email, mimetype='text/plain', status=400)

    providers = []
    for sa in UserSocialAuth.objects.filter(user=user.pk).only('provider'):
        providers.append(sa.provider.title())

    if len(providers) > 0:
        send_templated_mail(
            template_name='social_provider_reminder',
            from_email='Difio <reminder@dif.io>',
            recipient_list=[email],
            context={'providers' : providers},
        )
        return HttpResponse("Reminder sent to '%s'" % email, mimetype='text/plain', status=200)
    else:
        return HttpResponse("User found but no social providers found!", mimetype='text/plain', status=400)

This example assumes it is called via POST request which contains the email address. All responses are handled at the front-end via JavaScript. If a user with the specified email address exists this address will receive a reminder listing all social auth providers associated with the user account.

Front-end

On the browser side I like to use Dojo. Here is a simple script which connects to a form and POSTs the data back to the server.

require(["dojo"]);
require(["dijit"]);

function sendReminderForm(){
    var form = dojo.byId("reminderForm");

    dojo.connect(form, "onsubmit", function(event){
        dojo.stopEvent(event);
        dijit.byId("dlgForgot").hide();
        var xhrArgs = {
            form: form,
            handleAs: "text",
            load: function(data){alert(data);},
            error: function(error, ioargs){alert(ioargs.xhr.responseText);}
        };
        var deferred = dojo.xhrPost(xhrArgs);
    });
}
dojo.ready(sendReminderForm);

You can try this out at Difio and let me know how it works for you!

There are comments.

Python Twitter + django-social-auth == Hello New User

I have been experimenting with the twitter module for Python and decided to combine it with django-social-auth to welcome new users who join Difio. In this post I will show you how to tweet on behalf of the user when they join your site and send them a welcome email.

Configuration

In django-social-auth the authentication workflow is handled by an operations pipeline where custom functions can be added or default items can be removed to provide custom behavior. This is how our pipeline looks:

settings.py
SOCIAL_AUTH_PIPELINE = (
    'social_auth.backends.pipeline.social.social_auth_user',
    #'social_auth.backends.pipeline.associate.associate_by_email',
    'social_auth.backends.pipeline.user.get_username',
    'social_auth.backends.pipeline.user.create_user',
    'social_auth.backends.pipeline.social.associate_user',
    'social_auth.backends.pipeline.social.load_extra_data',
    'social_auth.backends.pipeline.user.update_user_details',
    'myproject.tasks.welcome_new_user'
)

This is the default plus an additional method at the end to welcome new users.

You also have to create and configure a Twitter application so that users can login with Twitter OAuth to your site. RTFM for more information on how to do this.

Custom pipeline actions

This is how the custom pipeline action should look:

myproject/tasks.py
from urlparse import parse_qs

def welcome_new_user(backend, user, social_user, is_new=False, new_association=False, *args, **kwargs):
    """
        Part of SOCIAL_AUTH_PIPELINE. Works with django-social-auth==0.7.21 or newer
        @backend - social_auth.backends.twitter.TwitterBackend (or other) object
        @user - User (if is_new) or django.utils.functional.SimpleLazyObject (if new_association)
        @social_user - UserSocialAuth object
    """
    if is_new:
        send_welcome_email.delay(user.email, user.first_name)

    if backend.name == 'twitter':
        if is_new or new_association:
            access_token = social_user.extra_data['access_token']
            parsed_tokens = parse_qs(access_token)
            oauth_token = parsed_tokens['oauth_token'][0]
            oauth_secret = parsed_tokens['oauth_token_secret'][0]
            tweet_on_join.delay(oauth_token, oauth_secret)

    return None

This code works with django-social-auth==0.7.21 or newer. In older versions the new_association parameter is missing as I discovered. If you use an older version you won't be able to distinguish between newly created accounts and ones which have associated another OAuth backend. You are warned!

Tweet & email

Sending the welcome email is out of the scope of this post. I am using django-templated-email to define how emails look and sending them via Amazon SES. See Email Logging for Django on RedHat OpenShift With Amazon SES for more information on how to configure emailing with SES.

Here is how the Twitter code looks:

myproject/tasks.py
import twitter
from celery.task import task
from settings import TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET

@task
def tweet_on_join(oauth_token, oauth_secret):
    """
        Tweet when the user is logged in for the first time or
        when new Twitter account is associated.

        @oauth_token - string
        @oauth_secret - string
    """
    t = twitter.Twitter(
            auth=twitter.OAuth(
                oauth_token, oauth_secret,
                TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET
            )
        )
    t.statuses.update(status='Started following open source changes at http://www.dif.io!')

This will post a new tweet on behalf of the user, telling everyone they joined your website!

NOTE: tweet_on_join and send_welcome_email are Celery tasks, not ordinary Python functions. This has the advantage of being able to execute these actions async and not slow down the user interface.

Are you doing something special when a user joins your website? Please share your comments below. Thanks!

There are comments.

Tip: Delete User Profiles with django-social-auth

Common functionality for websites is the 'DELETE ACCOUNT' or 'DISABLE ACCOUNT' button. This is how to implement it if using django-social-auth.

views.py
delete_objects_for_user(request.user.pk) # optional
UserSocialAuth.objects.filter(user=request.user).delete()
User.objects.filter(pk=request.user.pk).update(is_active=False, email=None)
return HttpResponseRedirect(reverse('django.contrib.auth.views.logout'))

This snippet does the following:

  • Delete (or archive) all objects for the current user;
  • Delete the social auth profile(s) because there is no way to disable them. DSA will create new objects if the user logs in again;
  • Disable the User object. You could also delete it but mind foreign keys;
  • Clear the email for the User object - if a new user is created after deletion we don't want duplicated email addresses in the database;
  • Finally redirect the user to the logout view.

There are comments.

Email Logging for Django on RedHat OpenShift with Amazon SES

Sending email in the cloud can be tricky. IPs of cloud providers are blacklisted because of frequent abuse. For that reason I use Amazon SES as my email backend. Here is how to configure Django to send emails to site admins when something goes wrong.

settings.py
# Valid addresses only.
ADMINS = (
    ('Alexander Todorov', 'atodorov@example.com'),
)

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'mail_admins': {
            'level': 'ERROR',
            'class': 'django.utils.log.AdminEmailHandler'
        }
    },
    'loggers': {
        'django.request': {
            'handlers': ['mail_admins'],
            'level': 'ERROR',
            'propagate': True,
        },
    }
}
 
# Used as the From: address when reporting errors to admins
# Needs to be verified in Amazon SES as a valid sender
SERVER_EMAIL = 'django@example.com'

# Amazon Simple Email Service settings
AWS_SES_ACCESS_KEY_ID = 'xxxxxxxxxxxx'
AWS_SES_SECRET_ACCESS_KEY = 'xxxxxxxx'
EMAIL_BACKEND = 'django_ses.SESBackend'

You also need the django-ses dependency.

See http://docs.djangoproject.com/en/dev/topics/logging for more details on how to customize your logging configuration.

I am using this configuration successfully at RedHat's OpenShift PaaS environment. Other users have reported it works for them too. Should work with any other PaaS provider.

There are comments.

Tip: Generating Directory Listings with wget

Today I was looking to generate a list of all files under remote site directory, including sub directories. I found no built-in options for this in wget. This is how I did it:

wget http://example.com/dir/ --spider -r -np 2>&1 | grep http:// | tr -s ' ' | cut -f3 -d' '

I managed to retrieve 12212 entries from the URL I was exploring.

There are comments.

Secure VNC Installation of Red Hat Enterprise Linux 6

RHEL 6 welcome screen Image CC-BY-SA, Red Hat

From time to time I happen to remotely install Red Hat Enterprise Linux servers via the Internet. When the system configuration is not decided upfront you need to use interactive mode. This means VNC in my case.

In this tutorial I will show you how to make VNC installations more secure when using public networks to connect to the server.

Meet your tools

Starting with Red Hat Enterprise Linux 6 and all the latest Fedora releases, the installer supports SSH connections during install.

Note that by default, root has a blank password.

If you don't want any user to be able to ssh in and have full access to your hardware, you must specify sshpw for username root. Also note that if Anaconda fails to parse the kickstart file, it will allow anyone to login as root and have full access to your hardware.

Fedora Kickstart manual https://fedoraproject.org/wiki/Anaconda/Kickstart#sshpw

Preparation

We are going to use SSH port forwarding and tunnel VNC traffic through it. Create a kickstart file as shown below:

install
url --url http://example.com/path/to/rhel6
lang en_US.UTF-8
keyboard us
network --onboot yes --device eth0 --bootproto dhcp
vnc --password=s3cr3t
sshpw --user=root s3cr3t

The first 5 lines configure the loader portion of the installer. They will setup networking and fetch the installer image called stage2. This is completely automated. NB: If you miss some of the lines or have a syntax error the installer will prompt for values. You either need a remote console access or somebody present at the server console!

The last 2 lines configure passwords for VNC and SSH respectively.

Make this file available over HTTP(S), FTP or NFS.

NB: Make sure that the file is available on the same network where your server is, or use HTTPS if on public networks.

Installation

Now, using your favorite installation media start the installation process like this:

boot: linux sshd=1 ks=http://example.com/ks.cfg

After a minute or more the installer will load stage2, with the interactive VNC session. You need to know the IP address or hostname of the server. Either look into DHCP logs, have somebody look at the server console and tell you that (it's printed on tty1) or script it in a %pre script which will send you an email for example.

When ready, redirect one of your local ports through SSH to the VNC port on the server:

$ ssh -L 5902:localhost:5901 -N root@server.example.com

Now connect to DISPLAY :2 on your system to begin the installation:

$ vncviewer localhost:2 &

Warning Bugs Present

As it happens, I find bugs everywhere. This is no exception. Depending on your network/DHCP configuration IP address during install may change mid-air and cause VNC client connection to freeze.

The reason for this bug is evident from the code (rhel6-branch):

iw/timezone_gui.py
if not anaconda.isKickstart:
    self.utcCheckbox.set_active(not hasWindows(anaconda.id.bootloader))
textw/timezone_text.py
if not anaconda.isKickstart and not hasWindows(anaconda.id.bootloader):
    asUtc = True

Because we are using a kickstart file Anaconda will assume the system clock DOES NOT use UTC. If you forget to configure it manually you may see time on the server shifting back or forward (depending on your timezone) while installing. If your DHCP is configured for short lease time the address will expire before the installation completes. When new address is requested from DHCP it may be different and this will cause your VNC connection to freeze.

To workaround this issue select the appropriate value for the system clock settings during install and possibly use static IP address during the installation.

Feedback

As always I'd love to hear your feedback in the comments section below. Let me know your tips and tricks to perform secure remote installations using public networks.

There are comments.

Tip: Save Money on Amazon - Buy Used Books

I like to buy books, the real ones, printed on paper. This however comes at a certain price when buying from Amazon. The book price itself is usually bearable but many times shipping costs to Bulgaria will double the price. Especially if you are making a single book order.

To save money I started buying used books when available. For books that are not so popular I look for items that have been owned by a library.

This is how I got a hardcover 1984 edition of The Gentlemen's Clubs of London by Anthony Lejeune for $10. This is my best deal so far. The book was brand new I dare to say. There was no edge wear, no damaged pages, with nice and vibrant colors. The second page had the library sign and no other marks.

Let me know if you had an experience buying used books online? Did you score a great deal like I did?

There are comments.

Combining PDF Files On The Command Line

VERSION

Red Hat Enterprise Linux 6

PROBLEM

You have to create a single PDF file by combining multiple files - for example individually scanned pages.

ASSUMPTIONS

You know how to start a shell and havigate to the directory containing the files.

SOLUTION

If individual PDF files are named, for example, doc_01.pdf, doc_02.pdf, doc_03.pdf, doc_04.pdf, then you can combine them with the gs command:

    $ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=mydocument.pdf doc_*.pdf

The resulting PDF file will contain all pages from the individual files.

MORE INFO

The gs command is part of the ghostscript rpm package. You can find more about it using man gs, the documentation file /usr/share/doc/ghostscript-*/index.html or http://www.ghostscript.com.

There are comments.

OpenShift Cron Takes Over Celerybeat

Celery is an asynchronous task queue/job queue based on distributed message passing. You can define tasks as Python functions, execute them in the background and in a periodic fashion. Difio uses Celery for virtually everything. Some of the tasks are scheduled after some event takes place (like user pressed a button) or scheduled periodically.

Celery provides several components of which celerybeat is the periodic task scheduler. When combined with Django it gives you a very nice admin interface which allows periodic tasks to be added to the scheduler.

Why change

Difio has relied on celerybeat for a couple of months. Back then, when Difio launched, there was no cron support for OpenShift so running celerybeat sounded reasonable. It used to run on a dedicated virtual server and for most of the time that was fine.

There were a number of issues which Difio faced during its first months:

  • celerybeat would sometime die due to no free memory on the virtual instance. When that happened no new tasks were scheduled and data was left unprocessed. Let alone that higher memory instance and the processing power which comes with it cost extra money.

  • Difio is split into several components which need to have the same code base locally - the most important are database settings and the periodic tasks code. At least in one occasion celerybeat failed to start because of a buggy task code. The offending code was fixed in the application server on OpenShift but not properly synced to the celerybeat instance. Keeping code in sync is a priority for distributed projects which rely on Celery.

  • Celery and django-celery seem to be updated quite often. This poses a significant risk of ending up with different versions on the scheduler, worker nodes and the app server. This will bring the whole application to a halt if at some point a backward incompatible change is introduced and not properly tested and updated. Keeping infrastructure components in sync can be a big challenge and I try to minimize this effort as much as possible.

  • Having to navigate to the admin pages every time I add a new task or want to change the execution frequency doesn't feel very natural for a console user like myself and IMHO is less productive. For the record I primarily use mcedit. I wanted to have something more close to the write, commit and push work-flow.

The take over

It's been some time since OpenShift introduced the cron cartridge and I decided to give it a try.

The first thing I did is to write a simple script which can execute any task from the difio.tasks module by piping it to the Django shell (a Python shell actually).

run_celery_task
#!/bin/bash
#
# Copyright (c) 2012, Alexander Todorov <atodorov@nospam.otb.bg>
#
# This script is symlinked to from the hourly/minutely, etc. directories
#
# SYNOPSIS
#
# ./run_celery_task cron_search_dates
#
# OR
#
# ln -s run_celery_task cron_search_dates
# ./cron_search_dates
#

TASK_NAME=$1
[ -z "$TASK_NAME" ] && TASK_NAME=$(basename $0)

if [ -n "$OPENSHIFT_APP_DIR" ]; then
    source $OPENSHIFT_APP_DIR/virtenv/bin/activate
    export PYTHON_EGG_CACHE=$OPENSHIFT_DATA_DIR/.python-eggs
    REPO_DIR=$OPENSHIFT_REPO_DIR
else
    REPO_DIR=$(dirname $0)"/../../.."
fi

echo "import difio.tasks; difio.tasks.$TASK_NAME.delay()" | $REPO_DIR/wsgi/difio/manage.py shell

This is a multicall script which allows symlinks with different names to point to it. Thus to add a new task to cron I just need to make a symlink to the script from one of the hourly/, minutely/, daily/, etc. directories under cron/.

The script accepts a parameter as well which allows me to execute it locally for debugging purposes or to schedule some tasks out of band. This is how it looks like on the file system:

$ ls -l .openshift/cron/hourly/
some_task_name -> ../tasks/run_celery_task
another_task -> ../tasks/run_celery_task

After having done these preparations I only had to embed the cron cartridge and git push to OpenShift:

rhc-ctl-app -a difio -e add-cron-1.4 && git push

What's next

At present OpenShift can schedule your jobs every minute, hour, day, week or month and does so using the run-parts script. You can't schedule a script to execute at 4:30 every Monday or every 45 minutes for example. See rhbz #803485 if you want to follow the progress. Luckily Difio doesn't use this sort of job scheduling for the moment.

Difio is scheduling periodic tasks from OpenShift cron for a few days already. It seems to work reliably and with no issues. One less component to maintain and worry about. More time to write code.

There are comments.

Tip: How to Get to the OpenShift Shell

I wanted to examine the Perl environment on OpenShift and got tired of making snapshots, unzipping the archive and poking through the files. I wanted a shell. Here's how to get one.

  1. Get the application info first

    $ rhc-domain-info 
    Password: 
    Application Info
    ================
    myapp
        Framework: perl-5.10
         Creation: 2012-03-08T13:34:46-04:00
             UUID: 8946b976ad284cf5b2401caf736186bd
          Git URL: ssh://8946b976ad284cf5b2401caf736186bd@myapp-mydomain.rhcloud.com/~/git/myapp.git/
       Public URL: http://myapp-mydomain.rhcloud.com/
    
     Embedded: 
          None
    
  2. The Git URL has your username and host

  3. Now just ssh into the application

    $ ssh 8946b976ad284cf5b2401caf736186bd@myapp-mydomain.rhcloud.com
    
        Welcome to OpenShift shell
    
        This shell will assist you in managing OpenShift applications.
    
        !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!
        Shell access is quite powerful and it is possible for you to
        accidentally damage your application.  Proceed with care!
        If worse comes to worst, destroy your application with 'rhc app destroy'
        and recreate it
        !!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!
    
        Type "help" for more info.
    
    [myapp-mydomain.rhcloud.com ~]\>
    

Voila!

There are comments.

How to Update Dependencies on OpenShift

If you are already running some cool application on OpenShift it could be the case that you have to update some of the packages installed as dependencies. Here is an example for an application using the python-2.6 cartridge.

Pull latest upstream packages

The most simple method is to update everything to the latest upstream versions.

  1. Backup! Backup! Backup!

    rhc-snapshot -a mycoolapp
    mv mycoolapp.tar.gz mycoolapp-backup-before-update.tar.gz
    
  2. If you haven't specified any particular version in setup.py it will look like this:

    ...
    install_requires=[
                    'difio-openshift-python',
                    'MySQL-python',
                    'Markdown',
                   ],
    ...
    
  3. To update simply push to OpenShift instructing it to rebuild your virtualenv:

    cd mycoolapp/
    touch .openshift/markers/force_clean_build
    git add .openshift/markers/force_clean_build
    git commit -m "update to latest upstream"
    git push
    

Voila! The environment hosting your application is rebuilt from scratch.

Keeping some packages unchanged

Suppose that before the update you have Markdown-2.0.1 and you want to keep it! This is easily solved by adding versioned dependency to setup.py

-       'Markdown',
+       'Markdown==2.0.1',

If you do that OpenShift will install the same Markdown version when rebuilding your application. Everything else will use the latest available versions.

Note: after the update it's recommended that you remove the .openshift/markers/force_clean_build file. This will speed up the push/build process and will not surprise you with unwanted changes.

Update only selected packages

Unless your application is really simple or you have tested the updates, I suspect that you want to update only selected packages. This can be done without rebuilding the whole virtualenv. Use versioned dependencies in setup.py :

-       'Markdown==2.0.1',
-       'django-countries',
+       'Markdown>=2.1',
+       'django-countries>=1.1.2',

No need for force_clean_build this time. Just

    git commit && git push

At the time of writing my application was using Markdown-2.0.1 and django-countries-1.0.5. Then it updated to Markdown-2.1.1 and django-countires-1.1.2 which also happened to be the latest versions.

Note: this will not work without force_clean_build

-       'django-countries==1.0.5',
+       'django-countries',

Warning

OpenShift uses a local mirror of Python Package Index. It seems to be updated every 24 hours or so. Have this in mind if you want to update to a package that was just released. It will not work! See How to Deploy Python Hotfix on OpenShift if you wish to work around this limitation.

There are comments.

Spinning-up a Development Instance on OpenShift

Difio is hosted on OpenShift. During development I often need to spin-up another copy of Difio to use for testing and development. With OpenShift this is easy and fast. Here's how:

  1. Create another application on OpenShift. This will be your development instance.

    rhc-create-app -a myappdevel -t python-2.6
    
  2. Find out the git URL for the production application:

    $ rhc-user-info
    Application Info
    ================
    myapp
        Framework: python-2.6
         Creation: 2012-02-10T12:39:53-05:00
             UUID: 723f0331e17041e8b34228f87a6cf1f5
          Git URL: ssh://723f0331e17041e8b34228f87a6cf1f5@myapp-mydomain.rhcloud.com/~/git/myapp.git/
       Public URL: http://myapp-mydomain.rhcloud.com/
    
  3. Push the current code base from the production instance to devel instance:

    cd myappdevel
    git remote add production -m master ssh://723f0331e17041e8b34228f87a6cf1f5@myapp-mydomain.rhcloud.com/~/git/myapp.git/
    git pull -s recursive -X theirs production master
    git push
    
  4. Now your myappdevel is the same as your production instance. You will probably want to modify your database connection settings at this point and start adding new features.

There are comments.


Page 2 / 2