This week Amazon announced support for dynamic content in their CDN solution Amazon CloudFront. The announce coincided with my efforts to migrate more pieces of Difio's website to CloudFront.
In this article I will not talk about hosting static files on CDN. This is easy and I've already written about it here. I will show how to cache AJAX(JSONP actually) responses and serve them directly from Amazon CloudFront.
For those of you who may not be familiar (are there any) CDN stands for Content Delivery Network. In short this employs numerous servers with identical content. The requests from the browser are served from the location which gives best performance for the user. This is used by all major websites to speed-up static content like images, video, CSS and JavaScript files.
AJAX means Asynchronous JavaScript and XML. This is what Google uses to create dynamic user interface which doesn't require to reload the page.
Difio has two web interfaces. The primary one is a static HTML website which employs JavaScript for the dynamic areas. It is hosted on the dif.io domain. The other one is powered by Django and provides the same interface plus the applications dashboard and several API functions which don't have a visible user interface. This is under the *.rhcloud.com domain b/c it is hosted on OpenShift.
The present state of the website is the result of rapid development using conventional methods - HTML templates and server-side processing. This is migrating to modern web technology like static HTML and JavaScript while the server side will remain pure API service.
For this migration to happen I need the HTML pages at dif.io to execute JavaScript and load information which comes from the rhcloud.com domain. Unfortunately this is not easily doable with AJAX because of the Same origin policy in browsers.
I'm using the Dojo Toolkit JavaScript framework which has a solution. It's called JSONP. Here's how it works:
dif.io ------ JSONP request --> abc.rhcloud.com --v
^ |
| |
JavaScript processing |
| |
+------------------ JSONP response ------------+
This is pretty standard configuration for a web service.
The way Dojo implements JSONP is through the dojo.io.script module. It works by appending a query string parameter of the form ?callback=funcName which the server uses to generate the JSONP response. This callback name is dynamically generated by Dojo based on the order in which your call to dojo.io.script is executed.
Until recently Amazon CloudFront ignored all query string parameters when requesting the content from the origin server. Without the query string it was not possible to generate the JSONP response. Luckily Amazon resolved the issue only one day after I asked about it on their forums.
Now Amazon CloudFront will use the URL path and the query string parameters to identify the objects in cache. To enable this edit the CloudFront distribution behavior(s) and set Forward Query Strings to Yes.
When a visitor of the website requests the data Amazon CloudFront will use exactly the same url path and query strings to fetch the content from the origin server. All that I had to do is switch the domain of the JSONP service to point to the cloudfront.net domain. It became like this:
| Everything on this side is handled by Amazon.
| No code required!
|
dif.io ------ JSONP request --> xyz.cloudfront.net -- JSONP request if cache miss --> abc.rhcloud.com --v
^ | ^ |
| | | |
JavaScript processing | +---------- JSONP response --------------------------+
| |
+---- cached JSONP response ---+
As you can see the website structure and code didn't change at all. All that changed was a single domain name.
Amazon CloudFront will keep the contents in cache based on the origin headers if present or the manual configuration from the AWS Console. To work around frequent requests to the origin server it is considered best practice to set the Expires header to a value far in the future, like 1 year. However if the content changes you need some way to tell CloudFront about it. The most commonly used method is through using different URLs to access the same content. This will cause CloudFront to cache the content under the new location while keeping the old content until it expires.
Dojo makes this very easy:
require(["dojo/io/script"],
function(script) {
script.get({
url: "https://xyz.cloudfront.net/api/json/updates/1234",
callbackParamName: "callback",
content: {t: timeStamp},
load: function(jsonData) {
....
},
The content property allows additional key/value pairs to be sent in the query string. The timeStamp parameter serves only to control Amazon CloudFront cache. It's not processed server side.
On the server-side we have:
response['Cache-Control'] = 'max-age=31536000'
response['Expires'] = (datetime.now()+timedelta(seconds=31536000)).strftime('%a, %d %b %Y %H:%M:%S GMT')
There were two immediate benefits:
The presented method works well for Difio because of two things:
There are comments.
It's been several months after the start of Difio and I started migrating various parts of the platform to CDN. The first to go are static files like CSS, JavaScript, images and such. In this article I will show you how to get started with Amazon CloudFront and OpenShift. It is very easy once you understand how it works.
Amazon CloudFront is cheap and easy to setup with virtually no maintenance. The most important feature is that it can fetch content from any public website. Integrating it together with OpenShift gives some nice benefits:
CloudFront will cache your objects for a certain period and then expire them. Frequently used objects are expired less often. Depending on the content you may want to update the cache more or less frequently. In my case CSS and JavaScript files change rarely so I wanted to tell CloudFront to not expire the files quickly. I did this by telling Apache to send a custom value for the Expires header.
$ curl http://d71ktrt2emu2j.cloudfront.net/static/v1/css/style.css -D headers.txt
$ cat headers.txt
HTTP/1.0 200 OK
Date: Mon, 16 Apr 2012 19:02:16 GMT
Server: Apache/2.2.15 (Red Hat)
Last-Modified: Mon, 16 Apr 2012 19:00:33 GMT
ETag: "120577-1b2d-4bdd06fc6f640"
Accept-Ranges: bytes
Content-Length: 6957
Cache-Control: max-age=31536000
Expires: Tue, 16 Apr 2013 19:02:16 GMT
Content-Type: text/css
Strict-Transport-Security: max-age=15768000, includeSubDomains
Age: 73090
X-Cache: Hit from cloudfront
X-Amz-Cf-Id: X558vcEOsQkVQn5V9fbrWNTdo543v8VStxdb7LXIcUWAIbLKuIvp-w==,e8Dipk5FSNej3e0Y7c5ro-9mmn7OK8kWfbaRGwi1ww8ihwVzSab24A==
Via: 1.0 d6343f267c91f2f0e78ef0a7d0b7921d.cloudfront.net (CloudFront)
Connection: close
All headers before Strict-Transport-Security come from the origin server.
Sometimes however you need to update the files and force CloudFront to update the content. The recommended way to do this is to use URL versioning and update the path to the files which changed. This will force CloudFront to cache and serve the content under the new path while keeping the old content available until it expires. This way your visitors will not be viewing your site with the new CSS and old JavaScript.
There are many ways to do this and there are some nice frameworks as well. For Python there is webassets. I don't have many static files so I opted for no additional dependencies. Instead I will be updating the versions by hand.
What comes to mind is using mod_rewrite to redirect the versioned URLs back to non versioned ones. However there's a catch. If you do this CloudFront will cache the redirect itself, not the content. The next time visitors hit CloudFront they will receive the cached redirect and follow it back to your origin server, which is defeating the purpose of having CDN.
To do it properly you have to rewrite the URLs but still return a 200 response code and the content which needs to be cached. This is done with mod_proxy:
RewriteEngine on
RewriteRule ^VERSION-(\d+)/(.*)$ http://%{ENV:OPENSHIFT_INTERNAL_IP}:%{ENV:OPENSHIFT_INTERNAL_PORT}/static/$2 [P,L]
This .htaccess trick doesn't work on OpenShift though. mod_proxy is not enabled at the moment. See bug 812389 for more info.
Luckily I was able to use symlinks to point to the content. Here's how it looks:
$ pwd
/home/atodorov/difio/wsgi/static
$ cat .htaccess
ExpiresActive On
ExpiresDefault "access plus 1 year"
$ ls -l
drwxrwxr-x. 6 atodorov atodorov 4096 16 Apr 21,31 o
lrwxrwxrwx. 1 atodorov atodorov 1 16 Apr 21,47 v1 -> o
settings.py:
STATIC_URL = '//d71ktrt2emu2j.cloudfront.net/static/v1/'
HTML template:
<link type="text/css" rel="stylesheet" media="screen" href="{{ STATIC_URL }}css/style.css" />
First you need to split all CSS and JavaScript from your HTML if you haven't done so already.
Then place everything under your git repo so that OpenShift will serve the files. For Python applications place the files under wsgi/static/ directory in your git repo.
Point all of your HTML templates to the static location on OpenShift and test if everything works as expected. This is best done if you're using some sort of template language and store the location in a single variable which you can change later. Difio uses Django and the STATIC_URL variable of course.
Create your CloudFront distribution - don't use Amazon S3, instead configure a custom origin server. Write down your CloudFront URL. It will be something like 1234xyz.cludfront.net.
Every time a request hits CloudFront it will check if the object is present in the cache. If not present CloudFront will fetch the object from the origin server and populate the cache. Then the object is sent to the user.
Update your templates to point to the new cloudfront.net URL and redeploy your website!
There are comments.
Celery is an asynchronous task queue/job queue based on distributed message passing. You can define tasks as Python functions, execute them in the background and in a periodic fashion. Difio uses Celery for virtually everything. Some of the tasks are scheduled after some event takes place (like user pressed a button) or scheduled periodically.
Celery provides several components of which celerybeat is the periodic task scheduler. When combined with Django it gives you a very nice admin interface which allows periodic tasks to be added to the scheduler.
Difio has relied on celerybeat for a couple of months. Back then, when Difio launched, there was no cron support for OpenShift so running celerybeat sounded reasonable. It used to run on a dedicated virtual server and for most of the time that was fine.
There were a number of issues which Difio faced during its first months:
celerybeat would sometime die due to no free memory on the virtual instance. When that happened no new tasks were scheduled and data was left unprocessed. Let alone that higher memory instance and the processing power which comes with it cost extra money.
Difio is split into several components which need to have the same code base locally - the most important are database settings and the periodic tasks code. At least in one occasion celerybeat failed to start because of a buggy task code. The offending code was fixed in the application server on OpenShift but not properly synced to the celerybeat instance. Keeping code in sync is a priority for distributed projects which rely on Celery.
Celery and django-celery seem to be updated quite often. This poses a significant risk of ending up with different versions on the scheduler, worker nodes and the app server. This will bring the whole application to a halt if at some point a backward incompatible change is introduced and not properly tested and updated. Keeping infrastructure components in sync can be a big challenge and I try to minimize this effort as much as possible.
Having to navigate to the admin pages every time I add a new task or want to change the execution frequency doesn't feel very natural for a console user like myself and IMHO is less productive. For the record I primarily use mcedit. I wanted to have something more close to the write, commit and push work-flow.
It's been some time since OpenShift introduced the cron cartridge and I decided to give it a try.
The first thing I did is to write a simple script which can execute any task from the difio.tasks module by piping it to the Django shell (a Python shell actually).
#!/bin/bash
#
# Copyright (c) 2012, Alexander Todorov <atodorov@nospam.otb.bg>
#
# This script is symlinked to from the hourly/minutely, etc. directories
#
# SYNOPSIS
#
# ./run_celery_task cron_search_dates
#
# OR
#
# ln -s run_celery_task cron_search_dates
# ./cron_search_dates
#
TASK_NAME=$1
[ -z "$TASK_NAME" ] && TASK_NAME=$(basename $0)
if [ -n "$OPENSHIFT_APP_DIR" ]; then
source $OPENSHIFT_APP_DIR/virtenv/bin/activate
export PYTHON_EGG_CACHE=$OPENSHIFT_DATA_DIR/.python-eggs
REPO_DIR=$OPENSHIFT_REPO_DIR
else
REPO_DIR=$(dirname $0)"/../../.."
fi
echo "import difio.tasks; difio.tasks.$TASK_NAME.delay()" | $REPO_DIR/wsgi/difio/manage.py shell
This is a multicall script which allows symlinks with different names to point to it. Thus to add a new task to cron I just need to make a symlink to the script from one of the hourly/, minutely/, daily/, etc. directories under cron/.
The script accepts a parameter as well which allows me to execute it locally for debugging purposes or to schedule some tasks out of band. This is how it looks like on the file system:
$ ls -l .openshift/cron/hourly/
some_task_name -> ../tasks/run_celery_task
another_task -> ../tasks/run_celery_task
After having done these preparations I only had to embed the cron cartridge and git push to OpenShift:
rhc-ctl-app -a difio -e add-cron-1.4 && git push
At present OpenShift can schedule your jobs every minute, hour, day, week or month and does so using the run-parts script. You can't schedule a script to execute at 4:30 every Monday or every 45 minutes for example. See rhbz #803485 if you want to follow the progress. Luckily Difio doesn't use this sort of job scheduling for the moment.
Difio is scheduling periodic tasks from OpenShift cron for a few days already. It seems to work reliably and with no issues. One less component to maintain and worry about. More time to write code.
There are comments.
I wanted to examine the Perl environment on OpenShift and got tired of making snapshots, unzipping the archive and poking through the files. I wanted a shell. Here's how to get one.
Get the application info first
$ rhc-domain-info
Password:
Application Info
================
myapp
Framework: perl-5.10
Creation: 2012-03-08T13:34:46-04:00
UUID: 8946b976ad284cf5b2401caf736186bd
Git URL: ssh://8946b976ad284cf5b2401caf736186bd@myapp-mydomain.rhcloud.com/~/git/myapp.git/
Public URL: http://myapp-mydomain.rhcloud.com/
Embedded:
None
The Git URL has your username and host
Now just ssh into the application
$ ssh 8946b976ad284cf5b2401caf736186bd@myapp-mydomain.rhcloud.com
Welcome to OpenShift shell
This shell will assist you in managing OpenShift applications.
!!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!
Shell access is quite powerful and it is possible for you to
accidentally damage your application. Proceed with care!
If worse comes to worst, destroy your application with 'rhc app destroy'
and recreate it
!!! IMPORTANT !!! IMPORTANT !!! IMPORTANT !!!
Type "help" for more info.
[myapp-mydomain.rhcloud.com ~]\>
Voila!
There are comments.
If you are already running some cool application on OpenShift it could be the case that you have to update some of the packages installed as dependencies. Here is an example for an application using the python-2.6 cartridge.
The most simple method is to update everything to the latest upstream versions.
Backup! Backup! Backup!
rhc-snapshot -a mycoolapp
mv mycoolapp.tar.gz mycoolapp-backup-before-update.tar.gz
If you haven't specified any particular version in setup.py
it will
look like this:
...
install_requires=[
'difio-openshift-python',
'MySQL-python',
'Markdown',
],
...
To update simply push to OpenShift instructing it to rebuild your virtualenv:
cd mycoolapp/
touch .openshift/markers/force_clean_build
git add .openshift/markers/force_clean_build
git commit -m "update to latest upstream"
git push
Voila! The environment hosting your application is rebuilt from scratch.
Suppose that before the update you have Markdown-2.0.1
and you want to keep it!
This is easily solved by adding versioned dependency to setup.py
- 'Markdown',
+ 'Markdown==2.0.1',
If you do that OpenShift will install the same Markdown
version when rebuilding your
application. Everything else will use the latest available versions.
Note: after the update it's recommended that you remove the
.openshift/markers/force_clean_build
file. This will speed up the push/build process
and will not surprise you with unwanted changes.
Unless your application is really simple or you have tested the updates, I suspect that
you want to update only selected packages. This can be done without rebuilding the whole
virtualenv. Use versioned dependencies in setup.py
:
- 'Markdown==2.0.1',
- 'django-countries',
+ 'Markdown>=2.1',
+ 'django-countries>=1.1.2',
No need for force_clean_build
this time. Just
git commit && git push
At the time of writing my application was using Markdown-2.0.1
and django-countries-1.0.5
.
Then it updated to Markdown-2.1.1
and django-countires-1.1.2
which also happened to be
the latest versions.
Note: this will not work without force_clean_build
- 'django-countries==1.0.5',
+ 'django-countries',
OpenShift uses a local mirror of Python Package Index. It seems to be updated every 24 hours or so. Have this in mind if you want to update to a package that was just released. It will not work! See How to Deploy Python Hotfix on OpenShift if you wish to work around this limitation.
There are comments.
Difio is hosted on OpenShift. During development I often need to spin-up another copy of Difio to use for testing and development. With OpenShift this is easy and fast. Here's how:
Create another application on OpenShift. This will be your development instance.
rhc-create-app -a myappdevel -t python-2.6
Find out the git URL for the production application:
$ rhc-user-info
Application Info
================
myapp
Framework: python-2.6
Creation: 2012-02-10T12:39:53-05:00
UUID: 723f0331e17041e8b34228f87a6cf1f5
Git URL: ssh://723f0331e17041e8b34228f87a6cf1f5@myapp-mydomain.rhcloud.com/~/git/myapp.git/
Public URL: http://myapp-mydomain.rhcloud.com/
Push the current code base from the production instance to devel instance:
cd myappdevel
git remote add production -m master ssh://723f0331e17041e8b34228f87a6cf1f5@myapp-mydomain.rhcloud.com/~/git/myapp.git/
git pull -s recursive -X theirs production master
git push
Now your myappdevel
is the same as your production instance. You will probably want to
modify your database connection settings at this point and start adding new features.
There are comments.
In this article I'm going to describe a simple way to set-up RPM repositories with access control using only standard tools such as yum, SSL and Apache. I've been talking about this at one of the monthly conferences of Linux for Bulgarians!
Objective:
Create RPM repository with access control. Access is allowed only for some systems and forbidden for the rest. This is a similar to what Red Hat Network does.
Solution:
We're going to use yum and Apache capabilities to work with SSL certificates. The client side (yum) will identify itself using SSL certificate and the server (Apache) will use this information to control the access.
Client side set-up:
# openssl genrsa -out /var/lib/yum/client.key 1024
Generating RSA private key, 1024 bit long modulus
....++++++
.......++++++
e is 65537 (0x10001)
# openssl req -new -x509 -text -key /var/lib/yum/client.key -out /var/lib/yum/client.cert
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:BG
State or Province Name (full name) []:Sofia
Locality Name (eg, city) [Default City]:Sofia
Organization Name (eg, company) [Default Company Ltd]:Open Technologies Bulgaria
Organizational Unit Name (eg, section) []:IT
Common Name (eg, your name or your server's hostname) []:
Email Address []:no-spam@otb.bg
# chmod 600 /var/lib/yum/client.key
# cat /etc/yum.repos.d/protected.repo
[protected]
name=SSL protected repository
baseurl=https://repos.example.com/protected
enabled=1
gpgcheck=1
gpgkey=https://repos.example.com/RPM-GPG-KEY
sslverify=1
sslclientcert=/var/lib/yum/client.cert
sslclientkey=/var/lib/yum/client.key
Whenever yum tries to reach the URL of the repository it will identify itself using the specified certificate.
Server side set-up:
Action rpm-protected /cgi-bin/rpm.cgi
AddHandler rpm-protected .rpm .drpm
SSLVerifyClient optional_no_ca
1 2 3 4 5 6 7 8 9 10 11 12 | #!/bin/bash
if [ "$SSL_CLIENT_M_SERIAL" == "9F938211B53B4F44" ]; then
echo "Content-type: application/x-rpm"
echo "Content-length: $(stat --printf='%s' $PATH_TRANSLATED)"
echo
cat $PATH_TRANSLATED
else
echo "Status: 403"
echo
fi
|
In practice:
The above set-up is very basic and only demonstrates the technology behind this. In a real world configuration you will need some more tools to make this really usable.
My company Open Technologies Bulgaria, Ltd. has developed a custom solution for our customers based on the above example called Voyager. It features a Drupal module, a CGI script and a client side yum plugin.
The Drupal module acts as web interface to the system and allows some basic tasks. Administrators can define software channels and subscription expiration. Customers can register and entitle their systems to particular channels. The functionality is similar to Red Hat Network but without all the extra features which we don't need.
The CGI script acts as a glue between the client side and the Drupal backend. It will read information about client credentials and act as first line of defence against non-authorized access. Then it will communicate with the Drupal database and get more information about this customer. If everything is OK then access will be allowed.
The yum plugin has the task to communicate with the Drupal backend and dynamically update repository definitions based on available subscriptions. Then it will send a request for the RPM file back to the Apache server where the CGI script will handle it.
The client side also features a tool to generate the client certificate and register the system to the server.
All communications are entirely over HTTPS.
This custom solution has the advantage that it is simple and easy to maintain as well as easy to use. It integrates well with other plugins (e.g. yum-presto for delta rpm support and yum-rhnplugin) and can be used via yum or PackageKit which are the standard package management tools on Red Hat Enterprise Linux 6.
There are comments.
Multiseat configurations are well known in the Linux community and have been used for a number of years now. In the last few years USB docking stations emerged on the market and are becoming popular among multiseat enthusiasts.
My company Open Technologies Bulgaria, Ltd. offers full support of USB multiseat for Red Hat Enterprise Linux 6 as a downstream vendor. We use the name SUMU (simple usb multi user) to refer to the entire multiseat bundle and in this article I'm going to describe the current state of technologies surrounding multiseat, how that works on RHEL 6 and some practical observations.
To build a multiseat system you need a number of individual components:
For detailed description of multiseat configuration take a look at http://plugable.com/2009/11/16/setting-up-usb-multiseat-with-displaylink-on-linux-gdm-up-to-2-20/ or at our source code. I'm going to describe only the differences in RHEL6.
GDM, udlfb and xorg-x11-drv-fbdev-displaylink need to be compiled and installed on the system.
To build an older GDM on RHEL6 you will need to adjust some of the patches in the src.rpm package to apply cleanly and tweak the .spec file to your needs. This also includes using the appropriate version of ltmain.sh from the distro.
The udev rules and scripts are slightly different due to the different device paths in RHEL6:
SYSFS{idVendor}=="17e9", SYSFS{bConfigurationValue}=="2", RUN="/bin/echo 1 > /sys%p/bConfigurationValue"
ACTION=="add", KERNEL=="fb*", SUBSYSTEM=="graphics", SUBSYSTEMS=="usb", PROGRAM="/usr/bin/sumu-hub-id /sys/%p/device/../", SYMLINK+="usbseat/%c/display", RUN+="/etc/udev/scripts/start-seat %c"
ACTION=="remove", KERNEL=="fb*", SUBSYSTEM=="graphics", RUN+="/etc/udev/scripts/stop-seat %k"
KERNEL=="control*", SUBSYSTEM=="sound", BUS=="usb", PROGRAM="/usr/bin/sumu-hub-id /sys/%p/device/../../../../", SYMLINK+="usbseat/%c/sound"
KERNEL=="event*", SUBSYSTEM=="input", BUS=="usb", SYSFS{bInterfaceClass}=="03", SYSFS{bInterfaceProtocol}=="01", PROGRAM="/usr/bin/sumu-hub-id /sys/%p/device/../../../../", SYMLINK+="usbseat/%c/keyboard", RUN+="/etc/udev/scripts/start-seat %c"
KERNEL=="event*", SUBSYSTEM=="input", BUS=="usb", SYSFS{bInterfaceClass}=="03", SYSFS{bInterfaceProtocol}=="02", PROGRAM="/usr/bin/sumu-hub-id /sys/%p/device/../../../../", SYMLINK+="usbseat/%c/mouse", RUN+="/etc/udev/scripts/start-seat %c"
We also use only /dev/event* devices for both mouse and keyboard.
The sumu-hub-id script returns the string busX-devY indicating the location of the device:
#!/bin/bash
if [ -d "$1" ]; then
echo "bus$(cat $1/busnum)-dev$(cat $1/devnum)"
exit 0
else
exit 1
fi
USB device numbering is unique per bus and there isn't a global device identifier as far as I know. On systems with 2 or more USB buses this can lead to mismatch between devices/seats.
For seat/display numbering we use the number of the framebuffer device associated with the seat. This is unique, numbers start from 1 (fb0 is the text console) and are sequential unlike USB device numbers. This also ensures easy match between $DISPLAY and /dev/fbX for debugging purposes.
Our xorg.conf.sed template uses evdev as the input driver. This driver is the default in RHEL6:
Section "InputDevice"
Identifier "keyboard"
Driver "evdev"
Option "CoreKeyboard"
Option "Device" "/dev/usbseat/%SEAT_PATH%/keyboard"
Option "XkbModel" "evdev"
EndSection
Section "InputDevice"
Identifier "mouse"
Driver "evdev"
Option "CorePointer"
Option "Protocol" "auto"
Option "Device" "/dev/usbseat/%SEAT_PATH%/mouse"
Option "Buttons" "5"
Option "ZAxisMapping" "4 5"
EndSection
We also use a custom gdm.conf file to avoid conflicts with stock packages. Only the important settings are shown:
[daemon]
AlwaysRestartServer=false
DynamicXServers=true
FlexibleXServers=0
VTAllocation=false
[servers]
0=inactive
AlwaysRestartServer=false is necessary to avoid a bug in Xorg. See below for issues description.
Audio is supported by setting $PULSE_SINK/$PULSE_SOURCE environment variables using a script in /etc/profile.d which executes after login.
Maximum seats:
The USB standard specifies a maximum of 127 USB devices connected to a single host controller. This means around 30 seats per USB controller depending on the number of devices connected to a USB hub. In practice you will have hard time finding a system which has that many port available. I've used Fujitsu's TX100 S1 and TX100 S2 which can be expanded to 15 or 16 USB ports using all external and internal ports and additional PCI-USB extension card.
While larger configuration are possible by using more PCI cards or intermediate hubs those are limited by the USB 2.0 transfer speed (more devices on a single hub, slower graphics) and a bug in the Linux kernel.
Space and cable length:
USB 2.0 limits the cable length to 5 meters. On the market I've found good quality cables running 4.5 meters. This means that your multiseat system needs to be confined is small physical space due to these limitations. In practice using medium sized multiseat system in a 30 square meters space is doable and fits into these limits. This is roughly the size of a class-room in a school.
You can of course use daisy chaining (up to 5 hubs) and active USB extension cords (11 meters) or USB over CAT5 cables (up to 45 meters) but all of these interfere with USB signal strength and can lead to unpredictable behavior. For example I've see errors opening USB devices when power is not sufficient or too high. Modern computer systems have built in hardware protection and shut off USB ports or randomly reboot when the current on the wire is too strong. I've seen this on a number of occasions and the fix was to completely power off and unplug the system then power it on again.
Also don't forget that USB video consumes a great deal of the limited USB 2.0 bandwidth. Depending on the workload of the system (e.g. office applications vs. multimedia) you could experience slow graphical response if using extension cords and daisy chaining.
Performance:
For regular desktop use (i.e. nothing in particular) I'd recommend using 32bit operating system. On 64bit systems objects take a lot more memory and you'll need 3-4 times more for the same workload as on 32bit. For example 16 users running Eclipse, gnome-terminal and Firefox will need less that 8GB of memory on 32bit and more than 16GB on 64bit. Python and Java are particularly known to use much more memory on 64bit.
Regular desktop usage is not CPU intensive and a modern Xeon CPU has no issues with it. One exception is Flash which always causes your CPU to choke. On multiseat that becomes even a bigger problem. If possible disable/remove Flash from the system.
Multiseat doesn't make any difference when browsing, sending e-mail, etc. You shouldn't experience issues with networking unless your workload doesn't require hi-speed connection or your bandwidth is too low. If this is the case you'd better use the USB NICs available in the docking stations and bond them together, add external PCI NICs or upgrade your networking infrastructure.
Disk performance is critical in multiseat especially because it affects the look and feel of the system and is visible by the end users. It is usually good practice to place /home on a separate partition and even on a separate disk. Also consider disabling unnecessary caching in user space applications such as Firefox and Nautilus (thumbnails and cache).
On a system with 2 x 7,2K RPM disks in BIOS RAID1 configuration and a standard RHEL6 installation (i.e. no optimizations configured) where /, swap and /home are on the same RAID array we have 15 users using GNOME, gedit, Firefox, gnome-terminal and gcc. The performance is comparable to stand alone desktop with occasional spikes which cause GNOME to freeze for a second or two. It is expected that disabling unnecessary caching will make things better.
Depending on the workload (reads vs. writes) you should consider different RAID levels, file system types and settings and changing disk parameters. A good place to start is the "Storage Administration Guide" and "I/O Tuning Guide" at http://docs.redhat.com.
Pictures from one of our deployments can be found on Facebook (no login required): http://www.facebook.com/album.php?aid=54571&id=180150925328433. A demonstration video from the same deployment can be found at http://www.youtube.com/watch?v=7GYbCDGTz-4
If you are interested in commercial support please contact me!
In the open source world everything is changing and multiseat is no exception. While GDM and ConsoleKit patches are not yet integrated upstream there's a new project called systemd which aims at replacing the SysV init scripts system. It already has several configuration files for multiseat and I expect it will influence multiseat deployments in the future. Systemd will be available in Fedora 15.
There are comments.