Weblog
2010-03-08
Don't touch that dir, dear!
One of our clients is using Magento and discovered last month that guest checkout had become broken, the orders would fail with a message like above. What the error above means is that a new order was being added with the requirement of having the customer identified as NULL or a real customer id, but instead the customer id was a non-existent one (0), which violates the above constrint.
We contributed the markjenkins and draconicfae comments on this bug report. (login required... :[ )
The user had to disable guest checkout while they waited for us to fix this.
We hoped this would go away with just an upgrade. Not wanting to break this production site any further, we did what all good sys admins do and made test envirionment. We copied the files from /var/www/websiteB/ into /var/www/websiteC/ and created a copy of the database. We reconfigured the database in our test copy in /var/www/websiteC/app/etc/local.xml . We removed and recreated the cache, session, and report directories from /var/www/websiteC/var/ to get rid of old cruft, including the old database credentials (don't want to test site to wack the production site!). To make our site accessible from a different domain name, we updated the relevant config keys in the config_core_data table.
We were almost ready to upgrade from there, but we realized that Magento's web based upgrade system (Magento Connect) would be interacting pretty heavilly at the file system. We wondered if this system had hard coded file paths in it, which logically could lead to the production site getting stoped on seeting how we didn't have any other controls to keep our test envirionments separate. So, we went looking for /var/www/websiteB/ and found it. We also found instances of /var/www/websiteA, because the client had moved Magento from one dir to another in the past.
Now we really ratched up the insanity, in order to make sure we could test the upgrade without danger, we started using find, grep, and sed to go on a war against /var/www/websiteA and B to force them to become C. That didn't work out so well, because there were php data/config files that had an array like format with string lengths hard coded into them! So, from there we soldiered on, trying to the PHP/PEAR config API at a high level and later at a lower level to try and do the find replace in these special files.
A fellow member wrote that PHP code, and I was off on my own trying to use it. For whatever reason, it wasn't working for me, so after much flailing I discovered the command /var/www/websiteC/pear .
# ./pear
Commands:
...
config-create Create a Default configuration file
config-get Show One Setting
config-help Show Information About Setting
config-set Change Setting
config-show Show All Settings
...
install Install Package
list List Installed Packages In The Default Channel
list-all List All Packages
...
uninstall Un-install Package
...
upgrade Upgrade Package
...
Using that handy command, I was able to list and change all of the configuration variables with paths in them from /var/www/websiteC/downloader/pearlib/pear.ini , which was the file we were most concerned about. Ouch, we wrote a 106 lines of custom PHP code as a fixer for nothing. Now, pear.ini wasn't the only file where the old paths were sitting around, there was als pear, peardev, pecl, php/pearcmd.php, and php/peclcmd.php from downloader/pearlib/, (which were fixable with SED), and plenty in /var/www/websiteC/downloader/pearlib/php/.registry/, so I tried things on a separate server to validate that these later ones wouldn't cause things to go-haywire during an upgrade. While I was at it, I tried to find out if even fixing pear.ini was important, and indeed it was, while running an upgrade with Magento Connect I was able to see the file system errors with /var/www/websiteB
But, I also saw file system errors for /var/www/websiteA! When I saw that, I realized that we had not only gaurded ourselves against harm to /var/www/websiteB, but I realized why our client was having trouble with the production site to begin with. The did an upgrade a month or so ago, and pieces of that upgrade didn't end up in /var/www/websiteB, they ended up in /var/www/websiteA! This explained perfectly why the foreign key constraint was being violated, an older version of Magento used customer id 0 to designate the guest customer in guest checkout, the new version using customer_id NULL for that. So, it turns out this installation had the SQL schema and foreign key constraints of a later version (that insists the customer_id be NULL), and the code of the earlier version! Another hint of this had also been in our face from the begining, at the bottom of the magento admin interface is a version number, and our client told us it was the wrong one. Once we discovered that, we knew we were on the right track, that there wasn't a bug in Magento at all, and that things would certainly be solved with an upgrade or re-install.
Once we had those files with the old paths fixed, we were able to perform the upgrade. I should warn you, because the upgrade does involve schema changes, mysql has to work pretty hard for awhile afterword, and your site will be unreponsible during that time. We also found we couldn't get into the admin interface unless we deleted ./app/code/core/Zend/Cache/. Another thing requested by the interface post-upgrade was re-indexing. With that, guest checkout was nice and shiny again; no more foreign key constraint violations.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.2009-09-27
Drive to the future
Here's the highlight problem from the U of M CS open programming contest
Problem 4 - Omicronian Refueling
You are a frequent commuter along a long road, which requirs you to fuel up several times. Unfortunately for you, the gas stations are all owned by a technologically advanced but impatient species, the Omicronians. The Omicronians have automated pumps, but when you want to fill your tank, you are charged for an entire tank, regardless of how much fuel you actually need, and all the fuel is pumped into your car (potentially causing an overflow). Fuel costs one dollar per litere.
And to make matters worse, the Omicronians charge a penalty for overflows: for x litres of spilled fuel, the penalty is x2
You have calculated the amount of fuel it takes to drive from on station to the next. This never changes. You always begin your trip with a full tank, and you can always travel from one station to the next with a full tank of gas. You would like to find the minimum possible cost of driving from one end of the road to the other.
Input
The first lines gives n, the number of tests in the input. Each of the next n pairs of lines represents one test. The first of teh pari lines contain an integer Q (the capacity of the tank — also the cost of a full tank of fuel, not including the penalties) and r, the number of road segments. The second line of the pair gives r numbers, representing the amount of fuel it takes to travel between stations.Output
For each road configuration, give the minimum cost (in dollars) of travelling the road.
Sample Input Sample Output 5 10 4
5 4 2 961 5 4
5 5 5 515 10 4
1 2 3 40 10 14
8 1 3 5 6 9 7 2 8 1 2 5 6 7127
Here's the solution I tried to submit in the contest.
#!/usr/bin/env python
from sys import stdin
def balloon_set_to_include(answers, i):
new_answers = [ possible_answer.copy()
for possible_answer in answers ]
for new_answer in new_answers:
new_answer.add(i)
answers.extend( new_answers )
def generate_trips(num_stations):
answers = [ set() ] # include the no fill trip
for i in xrange(num_stations):
balloon_set_to_include(answers, i)
return answers
def calculate_trip_cost(distances, refuels, tank_size):
amount_in_tank = tank_size
cost_so_far = 0
for i, distance in enumerate(distances):
amount_in_tank -= distance
if amount_in_tank < 0: # do we even make it?
return None
else:
if i in refuels: # do we refuel here?
amount_in_tank += tank_size
cost_so_far += tank_size
if amount_in_tank &rt; tank_size: # check for overflow
spilled_fuel = amount_in_tank - tank_size
cost_so_far += spilled_fuel**2
amount_in_tank = tank_size
return cost_so_far
def minimal_trip_cost(param_line, distance_line):
(Q, r) = (int(param) for param in param_line.split() )
distances = [ int(distance) for distance in distance_line.split() ]
assert( r == len(distances) )
possible_answers = generate_trips(r-1)
cheapest_trip = None
for possible_answer in possible_answers:
cost_of_trip = calculate_trip_cost(distances, possible_answer, Q)
if cost_of_trip != None:
if cheapest_trip == None or cost_of_trip < cheapest_trip:
cheapest_trip = cost_of_trip
return cheapest_trip
def main():
numtests = int(stdin.readline())
for i in xrange(numtests):
print minimal_trip_cost(stdin.readline().strip(),
stdin.readline().strip() )
if __name__ == "__main__":
main()
That code is able to process the sample data, but if the road length becomes any longer, the runtime can become enormous. I even had an instinct about this when I wrote it, I generated all possible combinations of stations to refuel at and knew that this kind of thing would grow big time as the number of stations grew. I even used the word balloon in one of my function names -- how appropriate.
Why is considering all possible combinations of stations with a refuel a disaster? Such considerations grow combinatorially. The number of refueling combinations when there are r road segments and n = r-1 stations is 1 + nC1 + nC2 + nC3 .... nCn . Math folks call each of thes combinations a partition. The number of partitions of a set grows very, very quickly.
It turns out that this problem meets the requirements for dynamic programming. A dynamic programming solution can't just be expressed in terms of minmal_cost for reaching each station — we computer scienties's call that greedy. In this problem it doesn't pay to be greedly, sometimes it is worth it to pay a high penalty cost for refueling at one point in order to land yourself into a sequence of stations where you refuel at a low cost. So, to account for this, we can express a dynamic programming solution as being the minimal cost of reaching a station d with x fuel remaining. This yields surprising good results, the dynamic programming table we need to fill in only ends up having (r+1) * (q+1) entries. The runtime is this proportional to r.
Here's the code I wrote in retrospect:
#!/usr/bin/env python
"""Given a fuel tank capacity, some road segment lengths (in fuel use),
and a tax of overfill squared for overfilling, find the least costly
path in the strange universe where tanks are autofilled with the tank capacity,
regardless of what is in them.
"""
from sys import stdin, maxint
from itertools import chain
PENALTY_EXPONENT = 2
class ProblemState(object):
"""Store information about the problem being solved
The locations in the problem are the starting point (0),
the fueling stations in between ( 1...(roads-1) )
and the destination (roads). So, the number of locations equals
the number of roads.
Attributes:
distances - the distances between nodes. distances[i]
roads - the number of road segments, which is also the number of locations
if you count the start, the fueling stations, and the destination
tank_size - the capacity of the tank
min_cost_cache - used to cache the results of
minimal_cost_of_arriving_with_remaining()
penalty_cache - used to cache the results of penalty()
"""
def __init__(self, distances, tank_size):
self.distances = distances
self.roads = len(distances)
self.tank_size = tank_size
self.min_cost_cache = {}
self.fuel_cost_cache = {}
def gen_tank_levels(problem_state):
"""Generate all possible tank levels, from 0 to tank_size inclusive
"""
return ( level
for level in xrange(problem_state.tank_size+1) )
def gen_destination_costs(destination, problem_state):
"""A generator function that provides the minimal costs of reaching
destination will all possible tank levels
values that come out of this generator are tuples, the first value
is the tank level on arriving and the second value is the cost of getting
there . This cost value can be None, which means that combination
of destination and level isn't possible, so if you're calling this
you should look for None and ignore it
e.g. this generator yeilds tuples like so:
(0, minimal_cost_of_arriving_with_remaining(destination, 0, problem_state)
(1, minimal_cost_of_arriving_with_remaining(destination, 1, problem_state)
(2, minimal_cost_of_arriving_with_remaining(destination, 2, problem_state)
....
( tank_size,
minimal_cost_of_arriving_with_remaining(destination, tank_size,
problem_state)
"""
return ( (tank_level,
minimal_cost_of_arriving_with_remaining(destination,
tank_level,
problem_state) )
for tank_level in gen_tank_levels(problem_state)
) # start of generator expression
def cost_of_refuel(fuel_in_tank, problem_state):
"""Calculate the cost of refueling a tank with a certain amount in it
"""
assert( 0 <= fuel_in_tank <= problem_state.tank_size )
# if the answer isn't cache
if not fuel_in_tank in problem_state.fuel_cost_cache:
# the cost of refueling when you have a full tank is zero, you
# don't need to refuel
if fuel_in_tank == problem_state.tank_size:
cost = 0
# the cost of refueling when you have an empty tank is the tank size
elif fuel_in_tank == 0:
cost = problem_state.tank_size
# the cost of refueling when you have a tank &tl; 0 and < tank_size
# is the tank size + the penalty for spilling the fuel you already
# have
else:
cost = problem_state.tank_size + fuel_in_tank ** PENALTY_EXPONENT
# cache the calculated cost
problem_state.fuel_cost_cache[fuel_in_tank] = cost
# return the answer from the cache
return problem_state.fuel_cost_cache[fuel_in_tank]
def min_cost_of_leaving_full_tank(destination, problem_state):
"""Calculate the full cost of reaching destination and leaving it
with a full tank
"""
assert( problem_state.tank_size < maxint )
# there are many posibilities, find the one with the minimum cost
min_value = min( chain( # chain together multiple iterators
# every integer is <= maxint, we assume that the tank
# size is < maxint, so we put maxint into the sequence
# min() is going to search through.
(maxint,), # use a tuple because chain needs iterators
# find out the minimal cost of filing the tank for all posible
# pre_refill values 0 to problem_state.tank_size inclusive
( # start generator expression
# for each pre_refill value find the cost of filling the
# tank and add in the cost of reaching this destination
# with that tank level
cost_of_refuel(pre_refill, problem_state) +
cost_prior_to_refuel
for pre_refill, cost_prior_to_refuel in \
gen_destination_costs(destination, problem_state)
# filter out cases where the cost of reaching this
# destination with that tank level isn't possible
if cost_prior_to_refuel != None
) # end generator expression
) ) # chain, min
# maxint will come out as the minimal cost if nothing is found
# so return None
if min_value == maxint:
return None
else: # we actually found something, woot!
return min_value
def minimal_cost_of_arriving_with_remaining(destination, fuel_remaining,
problem_state ):
"""Tells us what is the minimal cost of arriving at destination
(before a refueling decisions is made) with a particular amount of
fuel_remaining.
Return the minimal cost for (destination, fuel_remaining) or None if the
combination of destination and fuel remaining is possible.
We can not simply ask what is the minimal cost of arriving at just
destination, as fuel_reamaining is a critical piece of information for
future travels.
The only time we are interested in just the cost of arriving and not the
fuel remaining is for the final destination. To find that, we can later
call minimal_cost_of_arriving_with_remaining with the final destination
and all possible fuel remaining values, 0 through problem_state.tank_size
minimal_cost_of_arriving_with_remaining(0, problem_state.tank_size,
problem_state )
will always return 0, at this describes the starting condition where
no cost has been incurred yet.
minimal_cost_of_arriving_with_remaining(0, x, problem_state) for any
value x that isn't the tank size is None, as starting with a non-full
tank isn't a possibility.
other invocations will consider the fuel there must of been when leaving
the previous destination, fuel_remaining + distances[destination-1]
"""
# combine the destination and fuel_remaining together into a table lookup
# key
answer_key = (destination, fuel_remaining)
assert( 0<= destination <= problem_state.roads )
assert( 0<= fuel_remaining <= problem_state.tank_size )
# if the answer isn't cached, calculate it
if not answer_key in problem_state.min_cost_cache:
# this is our base case, the minimal cost of arriving at the
# start with a full tank is 0, arriving at the start with a
# non full tank is impossible
if destination == 0:
if fuel_remaining == problem_state.tank_size:
min_cost = 0
else:
min_cost = None
else: # destination &rt; 0
last_destination = destination - 1 # the last destination
# how much fuel would we have had when we left
# the last destination?
fuel_when_leaving_last = \
fuel_remaining + problem_state.distances[last_destination]
# if we left the last destination with a full tank, calculate
# the minimal cost of that possibility
if fuel_when_leaving_last == problem_state.tank_size:
min_cost = min_cost_of_leaving_full_tank(last_destination,
problem_state )
# if we left the last destination with less than a
# a full tank, calculate the minimal cost of that possibility
elif fuel_when_leaving_last < problem_state.tank_size:
min_cost = minimal_cost_of_arriving_with_remaining(
last_destination, fuel_when_leaving_last, problem_state )
# else we would of have left with more than a full tank, which
# is impossible
else:
min_cost = None # impossible case, ignore it
# cache the answer found
problem_state.min_cost_cache[answer_key] = min_cost
# return the answer from the cache
return problem_state.min_cost_cache[answer_key]
def minimal_trip_cost(param_line, distance_line):
# parse input
(tank_size, roads) = (int(param) for param in param_line.split() )
distances = tuple( int(distance) for distance in distance_line.split() )
assert( len(distances) == roads )
# store the fundamental problem parameters and cache
problem_state = ProblemState(distances, tank_size)
# the identifier for the final destination equals the number of roads
final_destination = roads
# Solve the problem with dynamic programming from the bottom up.
# Find the cost of reaching every destination but the last with all
# possible tank sizes
# This will run in (tank_size+1) * (tank_size+1) * roads time.
# The number of roads is really the problem size, so we have a O(n)
# algorithm!
#
# For each destination but the last, from starting point on up
for destination in xrange(final_destination):
# for each possible tank size when arriving,
# from 0 to tank_size inclusive
for fuel_remaining in gen_tank_levels(problem_state):
# calculate the cost of reaching destination with fuel_remaining
# but ignore the answer, as we know its cached which will
# aid calls at higher levels
minimal_cost_of_arriving_with_remaining(destination,
fuel_remaining,
problem_state)
# find the minimal cost of arriving at the final destination
# by considering all posible fuel levels there could be when
# arriving there.
# if none of these are possible, this will raise an exception, as
# min() gets very cranky when given an empty sequence
#
return min(
cost
for fuel_remaining, cost in gen_destination_costs(final_destination,
problem_state)
if cost != None # filter out possibilities with no possible cost
) # min(
def main():
numtests = int(stdin.readline())
for i in xrange(numtests):
print minimal_trip_cost(stdin.readline().strip(),
stdin.readline().strip() )
if __name__ == "__main__":
main()
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
The problem description is © the Department of Computer Science, University of Manitoba. All rights reserved.
The remainder of this article is © ParIT Worker Co-opertive,
This work is licensed under a Creative Commons Licence.
2009-09-26
The snake gets sneaky
I just finished participating in a programming contest at my alma matter. I placed 8th out of a field of ~30. Most of the competitors were undergraduates, and there were a few graduate students. I think I was the only alumni.
There were 6 problems and 4 hours to solve them. I wrote solutions to 5, 3 passed all the judges tests, 1 was buggy (but fine with the sample tests), and 1 chewed up too much time and resources when judged (but was fine with the sample tests). The contest winner solved 5! There was one problem that nobody solved -- I had a submission for it, but like many other submissions it ended up not being able to solve the judge's test inputs in reasonable time.
Technically I didn't even deserve my 8th place ranking. Because I had nothing to lose, I cheated. :-)
The only allowed languages were Java, C (gcc with GNU C library or cygwin C library), and C++ (g++ with GNU Standard C++ library). I cheated by using none of them. Here's my java, c++, C bashing session:
- C - I love it, but its not the perfect tool for the higher level jobs present in the contest -- C is really a portable machine level language. Add to this the fact that the contest instructions allowed Java and C++ users to see thier respectives manuals, but the links to the GNU C library docs were not provided. On top of that, C and the GNU C library don't come with high level data structures like lists, sets, and tables, whereas the Java and C++ implementations did. To be fair, they should of allowed linking with glib from Gtk to be fair. Well, to be fair, as nice as those data structures are, the problems were all reasonably solveable without them.
- C++ - I haven't used it much. This is a low level language like C that tries to masquerade as high level. How many programs in the world really need to stradle that line?
- Java - U of M CS made me write a lot of this, though that was years ago. If I were still an undergraduate I would of used this. I almost used it for this contest too -- prior to the contest I studied up on some of the new features that appeared in 1.5 and 1.6 after I stopped using Java. A few hours with generics and autoboxing and I was caught up. But studying those reminded me that I've learned to hate static typing and java's attempt to straddle low and high level via "primitives".
So, I used python.
If I'd asked nicely a few days before the contest, they could of easilly accomadated me and added support for python. But, if they said no, I wouldn't be able to use it, cause after asking they'd be looking to see if I used it anyway.
I didn't want to get booted off the system during the middle of contest for cheating (being found out at the end would be fine), so I had to figgure out a way to get one of the other there languages to run python for me, and I had to hold my submissions until the end, which cost me points. This was pretty silly, they're fairly nice folks at the CS department, denying me judging or a ranking would of been fine, but certainly they wouldn't have cut me off the system entirely. In the end I learned they were cool with it; so in retrospet I could of attained a better score by submitting solutions as they were completed.
Here's how I "tricked" them into running python via C:
#include <stdio.h>
#include <stdlib.h>
char * python_code = "print 'hello world'\n";
int main(int argc, char ** argv){
char * python_file_name = tempnam(NULL, NULL);
FILE * python_file = fopen(python_file_name, "w");
fputs(python_code, python_file);
fclose(python_file);
char command_run[1000];
sprintf(command_run, "python %s", python_file_name);
system(command_run);
remove(python_file_name);
free(python_file_name);
return 0;
}
The C string python_code I generated with this:
#!/usr/bin/env python
from sys import argv
def good(char):
if char == "\n":
char = "\\n"
elif char == "\\":
char = "\\\\"
elif char == '"':
char = '\\"'
return char
python_file = file(argv[1], "r")
for line in python_file:
print '"' + ''.join( good(char) for char in line ) + '"'
print ';'
They never had to kick me off or tell me to stop, but it turns out there would a reasonable grounds in here for telling me to. The c function system is used to call the python interpreter. It even nicely attaches stdout and stdin which the contest requires. If my program goes out of control with resources, and if they try to kill the C program, the python program will keep running. So, arguably if I had made submissions early in the contest there would be an argument against "oh well, just a harmless act someone who doesn't need official winner status anyway". You could argue that my kludge increases the administrative burden for judges beyond the burden they'd have with an out of control program that worked directly in one of the target languages. Aparently I came close to crashing the system!
I should of used more low level functions for the child process, caught SIGTERM, and taken care of killing the python program for them. Oh well, next year I'll just ask them to support python directly.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
2009-09-06
setting the nomail flag for an entire mailman list
I've been doing some admin work with Mailman, and for testing purposes had to set the nomail flag on a list with 9000 subscribers. The nomail flag prevents a mailing list member from recieving mail. I looked around, found scripts people had written to play with nomail a little more selectivly, but couldn't find the simpler case of setting nomail for an entire list. So I pieced things together and made my own:
def set_all_nomail(m):
for member in m.members.iterkeys():
m.setDeliveryStatus(member, 1)
m.Save()
I keep that in a file named set_all_nomail.py, and invoke it with
# ./withlist -l r set_all_nomail
withlist is a program mailman installs
And of course, there is also the reverse, set_all_nomail_off.py :
def set_all_nomail_off(m):
for member in m.members.iterkeys():
m.setDeliveryStatus(member, 0)
m.Save()
./withlist -l -r set_all_nomail_off mailinglistname
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
2009-08-31
Les.net VPBX API
Our VOIP provider Les.net has a cool virtual PBX feature, where you can record prompts for a flat menu and direct them to different places. This is included with his basic voip service, which can cost as little as $3.50 per month for a local number plus a cent or two per minute for calls.
We're in the process of switching from our self-hosted asterisk PBX over to this. One thing I thought we were going to lose was the ability to write scripts to change where our re-directs go. But, then I realized that the CGI interface for changing the re-directs in the Les.net VPBX would very likely be super simple.
I told my browser to "view source" and was able to wade through the html and find out the following.
Login
Use the http post method to http://www.les.net/login.php .
We're submitting an html form, so set the Content-Type to application/x-www-form-urlencoded and post them using the following arguments:
- login_username: Your username
- login_password: Your password
- login_button: the string 'Login'. (chances are Les doesn't even look for this
So the data transmitted by the client ends up looking like:
login_username=your_username&login_password=your_password&login_button=Login
The web server will return cookies in the http response. Save them.
Change PBX routes
Do an http post to http://www.les.net/vpbxedit.php . Include the cookies Les gave you in your request and use x-www-form-urlencode again to post the following:
- peer_id: This is the numeric identifier you see in the VPBX listing
- peer_description: Descriptive text for your VPBX
- id_peer_0: The peer id for the first menu option (press 0). Set to 0 for disabled, set to 99999999 for Test IVR, or set to the last 5 digits of a peer from your peer listing.
- id_peer_1: For press 1
- id_peer_2: ya di yada
- id_peer_3
- id_peer_4
- id_peer_5
- id_peer_6
- id_peer_7
- id_peer_8
- id_peer_9
- vpbx_button: Set this to 'Save Changes', but Les probably doesn't check this
peer_id=12345&peer_description=my%20pbx&id_peer_0=54321&id_peer_1=66666&id_peer_2=0&id_peer_3=0&id_peer_4=0&id_peer_5=0&id_peer_6=0&id_peer_7=0&id_peer_8=0&id_peer_9=0&vpbx_button=Save%20Changes
Some web based business are dumb enough to call this kind of activity “illegal customer reverse engineering”. Les would be more likely to say “inovative customer use of my services”, or perhaps just shrug.
Now I just have to write some code to use this. Will I use shell and curl, or python? Find out next time... to be continued....
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
2009-08-11
Using Multiple Interfaces to get 1MB+/s
It is already possible to connect to more than one network at the same time, so long as you have more than one interface. Imagine using all of them simultaneously - doubling or tripling your download speed. This is quite easy to setup, although rarely done by default. This guide should serve as reference for anyone interested in using multiple network connections simultaneously.
I have personally reached speeds of 1MB/s downloading from two AP's using bittorrent. Bittorrent benefits from multiple interfaces because of the sheer number of TCP connections, they end up being spread out over the different interfaces.
This guide uses the example of a laptop with two wireless interfaces connecting to two different access points in the area. Ideal for a coffee shop or downtown location.
A simple iwconfig should show all the wireless devices on your system, mine looks like
eth1 IEEE 802.11g ESSID:"barfoo" Nickname:"barfoo"
Mode:Managed Frequency:2.447 GHz Access Point: 00:13:10:78:9D:15
Bit Rate:54 Mb/s Tx-Power=20 dBm Sensitivity=8/0
Retry limit:7 RTS thr:off Fragment thr:off
Power Management:off
Link Quality=80/100 Signal level=-49 dBm Noise level=-88 dBm
Rx invalid nwid:0 Rx invalid crypt:419 Rx invalid frag:0
Tx excessive retries:37 Invalid misc:490 Missed beacon:0
wlan0 IEEE 802.11bg ESSID:"marcjosh"
Mode:Managed Frequency:2.437 GHz Access Point: 00:18:F8:F3:91:3F
Bit Rate=1 Mb/s Tx-Power=20 dBm
Retry min limit:7 RTS thr:off Fragment thr=2352 B
Power Management:off
Link Quality=37/100 Signal level:-76 dBm Noise level=-92 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0
Connect to both AP's at the same time using both interfaces. My laptop runs Ubuntu so I use the default NetworkManager to connect, use whatever program you want.
Do a "ip route" to see what the defined routes are. Excluding the irrelevent ones mine shows
10.0.0.0/24 dev eth1 proto kernel scope link src 10.0.0.121 metric 2 192.168.1.0/24 dev wlan0 proto kernel scope link src 192.168.1.101 metric 2 default via 10.0.0.1 dev eth1 proto static
This shows that even though I'm connected to both networks all my data to the outside world is going through only one, "barfoo". This is what we need to change. We need to get rid of the default route and create one of our own. Use "route del default dev eth1" to get rid of the old one.
The “ip” command can be quite daunting. The man page provides a very detailed explanation for everything, but really when you start to use it, its quite straight forward. Create a new default route that uses both devices, specify them both by using the nexthop option
ip route add default \ scope global \ nexthop dev eth1 \ nexthop dev wlan0
Thats it! Do a "ip route" to confirm
default nexthop dev eth1 weight 1 nexthop dev wlan0 weight 1
This process is actually ECMP, or Equal Cost Multipath Routing. It takes care of using all the interfaces available, but keeps TCP pairs on one interface - a requirement of the protocol.
This method is not specific to wireless, it can be used with ethernet or any other interface. Just specify the specific "dev iface" when you create the new default route.
If you know one connection is faster than another you can weight the routes. This causes the ratio of traffic going to different interfaces to change accordingly. An example of a 1:2 ratio is below
ip route add default \ scope global \ nexthop dev eth1 weight 1 \ nexthop dev wlan0 weight 2
References
Dumbest Device Driver Ever -- Cash Drawer Goes Pop!
Its a USB human interface device (HID), vendor id 0d3a, product id 0207.
It was actually fortunate that the cash drawer didn't come with a proprietary JPOS driver that would work on GNU/Linux (as previous Posiflex cash drawers did) , because having something that "works" would of taken away the motivation to make something that not only works, but is free to share, free to modify.
My first step was to get a good "sniff" of working communication. I called up qemu as follows:
# qemu -net nic -net user -std-vga -m 512 -kernel-kqemu -hda winxp_pro.img -localtime -usb
While that booted, I ask my collegue, "look, it's our old friend Windows XP. Don't you miss him?". Nope.
I installed the vendor provided software.
I plug in the device, and dmesg informs me that the hiddev driver has taken over. We can't have that getting in the way, so I nuke it:
# libhid-detach-device 0d3a:0207
(That program comes from libhid)
I think usbfs needs to be mounted for this to work, so
# mount -t usbfs usbfs /proc/bus/usb/
Now I can make the device available to qemu through the qemu monitor (ctrl+alt+2).
qemu_monitor # usb_add host:0d3a:0207
To perform my sniffing, I use Snoopy Pro. This was actually a poor choice, I later learned about a linux snooping facility called usbmon when I needed to sniff my free GNU/Linux code to see if it was working right. (see usbmon.txt). This would of be much less a pain, as running xp under qemu, even with kqemu was pretty brutal -- and even more so with Snoopy Pro involved. Everything qemu does with USB is passed through the linux usb stack, so the usbmon approach would of been just as useful.
Sniffing revealed that the device sends a 120 bytes over the interrupt output endpoint. The first two bytes were the drawer id (set with jumpers on the back), the rest a bunch of zeros, and a bunch of garbage I decided to ignore.
Now it was driver writting time. I had a lot of facilities to choose from:
I immediatly ruled out writting a kernel module. Why go through the extra trouble when userspace facilities are easier to use? It's not like opening a cashdrawer has hard core real time or bulk transfer constraints.
I also ruled out libusb and jUSB. If the device really is a USB HID as it claimed (which I found very wierd at first having falsely belived that HID was for mice, keyboards, and joysticks), it makes sense to use a HID specific facility.
That left libhid and hiddev. Because linux was connecting the device with hiddev as soon as it was plugged in, I realized that using libhid would require extra effort to put to use -- I would have to ensure that linux didn't use hiddev by using the detach program or equivlent code in my libhid code. I also found it cool that hiddev could be used via a device node (e.g. /dev/usb/hiddev0), where as libhid performs its usb access via usbfs. I'm not familier with making permission changes for usbfs permenent, but for device nodes, getting udev to do my bidding is second nature. Not only does udev let me control the permissions for device nodes, but allows me to do things like /dev/posiflex_drawer .
But hiddev kind of freaked me out. I looked at the docs and it was immediatly clear that I wouldn't be able to just write to the device node. The only system calls available were read and ioctl.
It took me awhile going through the hiddev docs, the hidev.h header, and smaterings of example code found on the web to appreciate that I could get this done with the right ioctls. Example code tended to focus on reading not writting; an understandable situation for human interface devices. I even had to peek at the linux source (hiddev.c and hid-core.c) to really be sure about a few things.
In order to use the hiddev api I had to learn to think about the problem at the HID layer instead of the USB layer. Instead of trying to "write 120 bytes to the interrupt output endpoint", I had to discover that "sending a report for updates for the 120 usages in the only available report field" would be exactly the same.
So, here it is, the source:
#include <stdlib.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <asm/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <linux/hiddev.h>
/* Posiflex 6300 example code
*
* Support the Posiflex 6000 usb cashdrawer (CR6300 series)
* Implemented using the linux hiddev driver, which picks this device up
* automatically because its a USB human interface device (HID).
*
* see hiddev.txt and hiddev.h for more info
*
* Author: Mark Jenkins <mark@parit.ca>, ParIT Worker Co-operative
* August, 2009
*
* To the extent possible under law, ParIT Worker Co-operative has waived
* all copyright and related or neighboring rights to Posiflex 6300 example
* code.
* http://creativecommons.org/publicdomain/zero/1.0/
*
* This program is perhaps even too trivial for copyright to have ever even
* applied..
*/
/* If you don't want to read the source code and you just want to know how the
bloody device works, here's the goods:
If you read read the HID report records provided by the device, you'll
learn that you can you can send this device 120 byte chunks
(via the interrupt style output endpoint).
Set the first byte to the drawer id set with jumpers (0-7) on the back.
Set the second byte as well to the drawer id.
Set the remaining bytes to 0.
Voila!
*/
int main()
{
/* Open the hiddev device node co-responding to the cashdrawer,
managed by the linux hiddev driver.
open read only because only read and ioctl are supported.
It is recommended that you get udev to create a consistent device node
for your cashdrawer instead of this one, as you never know when another
hiddev device might get plugged in and bugger everything up....
It is also recommended to get udev to set some decent permissions on
this, these default device notes are read/write root only
It would also be wise to make this a command line argument to this
program
*/
int fd = open ("/dev/usb/hiddev0", O_RDONLY);
if (fd < 0 ){
printf("error opening device node %d\n", fd);
return 1;
}
// change this to match the drawer number specified with jumpers
// on the back. This really ought to be a command line argument...
int drawer_number = 7;
// represent the USB HID report for the cashdrawer, of which there
// is only one
struct hiddev_report_info rinfo;
// represent the USB HID report field, of which there is only one
struct hiddev_field_info finfo;
// there are 120 USB HID ussages for the field, so we use uref_multi.
struct hiddev_usage_ref_multi uref_multi;
// find the one and only report that can be sent to the device
rinfo.report_type = HID_REPORT_TYPE_OUTPUT;
rinfo.report_id = HID_REPORT_ID_FIRST;
ioctl(fd, HIDIOCGREPORTINFO, &rinfo);
// set up the one and only field for this device
finfo.report_type = rinfo.report_type;
finfo.report_id = rinfo.report_id;
finfo.field_index = 0;
ioctl(fd, HIDIOCGFIELDINFO, &finfo);
// get the hiddev_usage_ref stored in uref_multi set up
uref_multi.uref.report_type = rinfo.report_type;
uref_multi.uref.report_id = rinfo.report_id;
uref_multi.uref.field_index = 0;
uref_multi.uref.usage_index = 0;
ioctl(fd, HIDIOCGUCODE, &(uref_multi.uref) );
// we'll be acting on all 120 usages
uref_multi.num_values = finfo.maxusage;
// set the value in the sample hiddev_usage_ref
uref_multi.uref.value = drawer_number;
// set the first two bytes to the drawer number
uref_multi.values[0] = drawer_number;
uref_multi.values[1] = drawer_number;
// set the remaining (118) bytes to 0
int i;
// gee, why didn't I just use freaking memset, then this code
// could of been 100% loop free...
for (i=2; i<uref_multi.num_values; i++)
uref_multi.values[i] = 0x00000000;
// get the report ready by sending it all the ussages to be changed
ioctl(fd, HIDIOCSUSAGES, &uref_multi);
// send off the report to the device
ioctl(fd, HIDIOCSREPORT, &rinfo );
// listen to that wonderful popping noise
// weeee, we're done
close(fd);
return 0;
} // main
You may notice that I've put the code into the public domain. I think it's too damn simple to have copyright on, and enforcing copyleft or even attribution on it would be a waste of my time. So take, do what you wish with it folks... As for this article, the license at the bottom still applies.
As one of my comments indicates, if only I had used memset for the 118 zeros I could of done the entire thing without a loop!
Last but not least, I'm now going to go and register this thing in a linux device database. Here's the device entry, here's the driver entry
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
2009-06-15
Cat5e Goes the Distance
I did this with cat5e twisted pair cable, terminated with 8P8C plugs (often called RJ-45), a DB-25 male to 8P8C adapter (for the printer side), and a DE-9 female to 8P8C adapter (for the computer side).
The printer was a Star Micronics SP500. Fortunately it came with a manual that helped me get the wiring right.
In addition, I was looking for advice on how to connect this stuff on the web as well, but it was a really good idea to read the fine manual. These websites were looking at things differently, as they were guiding people on how to connect computers, modems, and other devices like network equipment. With 25 pins on one end, 8 wires in the cat5e, and 9 pins on the other end, there are many possibilities, so I had to be careful.
Here is how I connected things
| RS232 function (printer side) |
DB-25 pin # (printer side) |
8P8C pin # (printer side) |
twisted pair wire | 8P8C pin # computer side) |
DE-9 pin # (computer side) |
RS232 function (computer side) |
|---|---|---|---|---|---|---|
| RTS (Request To Send) | 4 | 1 | brown | 8 | 6 | DSR (Data Set Ready) |
| DTR (Data Terminal Ready) | 20 | 2 | white-brown | 7 | 8 | CTS (Clear To Send) |
| TxD (Transmit Data) | 2 | 3 | green | 6 | 2 | RxD (Receive Data) |
| Should of been SG
(Signal Ground) but disconnected |
7 | 4 | white-green | 5 | 5 |
Should of been SG (Signal Ground) but disconnected |
| SG (Signal Ground) | 7 | 5 | blue | 4 | 5 | SG (Signal Ground) |
| RxD (Receive Data) | 3 | 6 | white-blue | 3 | 3 | TxD (Transmit Data) |
| DSR (Data Set Ready) | 6 | 7 | orange | 2 | 4 | DTR (Data Terminal Ready) |
| Not connected | 8 | white-orange | 1 | Not connected |
You can compare this against the SP500 manual to be sure it is right.
One thing I wish I could of done would of been to make use of the white-green wire in my twisted pair (forth row in table). It was recommented that this be connected to pin 7 on the DB-25 (signal ground) and pin 5 on the DE-9 (signal ground). But it is also recommended to connect the same two pins over the blue twisted pair wire (fifth row). The idea here is to have both the data transmitting from the printer to computer (recieve side) and the data transmitting from the computer to the printer (recieve side) in a twisted pair with signal ground line. These three line are the most important ones, and it is the twisted pair aspect of this kind of cable that makes long distance, high speed data transmission possible. The pair aspect of things is only useful if complementary signals are paired together. But, I didn't have the tools or know how to properly sodder or crimp things together in the adapter, so I had to choose: protect the printer to computer transmissions, or computer to printer ones. Because it is a printer being used, I chose the later. I never even tested if data could be read from the printer, so I can't even gaurentee this works in the other direction.
To be complete, I should describe how the pin numbers here line up with the actual physical interface.:
-
DB-25
When you're looking at the pins of a DB-25 male connector from the outside, rotate the long end to be facing up. Along the top side, from left to right you have pins 1-13. Then, returning to the bottom left, going from left to right you have 14 through 24. The pin numbers are a mirror of this when you look at a DB-25 female connector, or a male connector from the *inside* of an adapter. -
DE-9
When you're looking at the holes on a female DE-9 connector from the outside, and if you rotate the long end up, the pins are numbered 1 through 5 from top right to top left, and 6-9 from bottom right to bottom left. The pin outs are a mirror of this when you look into DE-9 male pins (on a computer) or from the inside of a female adapter. -
8P8C (aka RJ-45)
When you look into a RJ-45 jack (female), if you rotate the jack such that the area where the release mechanism lives is facing down, the pins are 1 to 8 from left to right.
You can see this printer/computer hookup in action at Mondragon Bookstore and Coffeehouse. Go during off-peak hours, walk up to the office counter, place an order, look to your right and you will see the order printed off in the kitchen where the line cook works.
ParIT is available to to perform a similar installation at $80/h. It took me 3 hours the first time, but will take less now that I've got things figgured out. There is also the cost of adapter ($5 each), and cat5e cable.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
The cult of the /opt/
# ./configure --prefix=/usr/local/ && make && make install
or
# python setup.py install --prefix=/usr/local
I shudder when I betray the tribe and do this.
In doing so, I lose what I consider to be the best three things about package management: first, after intallation I can see where all the files that were just installed went (so I can know what was put on *MY* system), second I can remove the installed software with the wave of my hand, and third I can upgrade the software and have any files from the old version disapear.
If you want to remove software put in place by the the build it yourself approach, you're going to have to guess which files were installed back when you installed it. I had to undo somebody else's instalation today and it wasn't fun. I had to identify and delete a bunch of files in PREFIX/bin/, PREFIX/share/, and PREFIX/lib/. It took me quite a few minutes and wasted quite a few more by inspiring this weblog post.
And you're praying to the gods if you install a new version at a later date on top of an old version. There is always the posibility that a file from the old version that is not part of the old version will be left sticking around. Not only is this crufty, but it could potentially lead to a disaster. What if a old man page is left lying around in share/man and somebody reads it and goes bonkers trying to use the old functionality. What if a piece of software likes to scan and act on all files found in a directory acts on an old one left there? What if a deprecated executable is left in bin/ or library left in lib/ and somebody who was using it (or who starts using it) doesn't really appreciate that they can't / shouldn't anymore? (the best way to convey that it can no longer be used is to delete it!!)
Thus, I come before you to share the alternative: build it in /opt/package_name_version. And I really do mean /opt/package_name_version and not plain /opt/ . If you --prefix=/opt you are getting into the same problem I just described with /usr/local/. But, with a dedicated directory, you gain the ability to remove everything from that package and only that package in one swoop with a florish of rm. Second, you can see everything that was installed (and not see the things installed by other folks) with a simple ls -R.
But, there is a darkside to doing it this way. You need /opt/package_name_version/bin in your path when executing stuff from there. You need /opt/package_name_version/lib/python?.?/site-packages in your PYTHONPATH when using the world's greatest language. You need MANPATH=/opt/package_name_version/share/man when running man to see the man pages. You need to add a file to /etc/ld.so.conf.d/ or to set LD_LIBRARY_PATH to help you find /opt/package_name_version/lib if you are executing something linked against the package. And of course, you might end up needing -I/opt/package_name_version/include and -L/opt/package_name_version/lib when compiling and linking against this package.
These things take some attention and patience that you don't need to have when you simply install into --prefix=/usr/local, but there an upside -- if you don't want these things to be paid attention to, you get that automatically by simply neglecting to set these things things in your environment. By contrast, if you install something in /usr/local/ it will become available to everyone, everywhere? How do you know this won't lead to some conflict with the use of stuff in /usr/?
But, I should add, I'm not completely biggoted against /usr/local/. When I write custom, system specific scripts, I do put them in /usr/local/bin/. There is no harm done here, because the system specific script ends up there and only there. I know where every piece of it it is if I want to rm it.
The manually installed piece of software that I had to remove today was actually in /usr/! This is worse then the problems I describe with /usr/local, because there is even greater risk for conflict with the operating system / package management system when something goes there. This is the package management systems's turf, and the most horendous conflict is when the packagement system writes over a file you've put there. Installing into /usr/ is a sin commited by default by python distutils (which you use to make a setup.py file).
There are also two styles worth considering when installing multiple related packages with interdepenencies into /opt/package_name_version : you can give each its own /opt/package_name_version directory / prefix, or you can pick one directory / prefix for all of them. The later style is sometimes nice when you want to treat multiple packages as one.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
2008-08-22
Reducing redundancy in bind zone files
Warning
I assume fairly advanced knowledge of bind and DNS here.
I'm feeling redundant
If you've ever looked at the zone files in a typical BIND DNS setup, you'll find quite a bit of redundancy between them. Every single zone will have a separate file, each with SOA, NS, A, and possibly MX and CNAME records. Often these files are almost identical to each other, typically the result of the file being copied from another. I've encountered this most often on systems that provide web and email service for several domain names.
Each time a new domain name is added, the admin typically copies a previous zone file. If they've been using absolute (not relative) record names in the zone files, they have to go through the new file and change this everywhere. If they've been using the $ORIGIN directive they have to change that too.
Change is hard
Sure enough, a day comes where the admin decides to make a change that is common to all of these zones, such as an ip address (A record) change, new secondary name server (NS), etc.. If there are many zone files, the change will require some serious scripting. This won't just be a matter of replacing an ip address in all places with another, serious DNS admins will lower the TTL, wait the old TTL, change the record, and bring back the OLD TTL when making some kinds of changes.
Now, double all of this trouble and opportunity for screwup when you start using split DNS. (this means your nameserver gives different answers to queries depending on where they original from, common use case is to provide different service to a LAN from the world)
To avoid what I call "zone file redundancy hell", you should take advantage of the $INCLUDE directive in your zone files to move redundant information into one place.
Common Configuration
I'm going to take you through an example setup that provides authoritative, master nameservice for cool.tld. and super.tld., and avoids redundancy. Note that my configuration files are derived from the bind9 package in Debian and Ubuntu, which puts configuration in /etc/bind where they belong. You can adapt the ideas here to a more typical BIND configuration.
// this is named.conf, it implements split DNS
include "/etc/bind/named.conf.options";
view "local_network"
{
match-clients {localhost; };
recursion yes;
// prime the server with knowledge of the root servers
zone "." {
type hint;
file "/etc/bind/db.root";
};
// Consider adding the 1918 zones here, if they are not used in your
// organization
include "/etc/bind/zones.rfc1918";
// be authoritative for the localhost forward and reverse zones, and for
// broadcast zones as per RFC 1912
zone "localhost" {
type master;
file "/etc/bind/db.local";
};
zone "127.in-addr.arpa" {
type master;
file "/etc/bind/db.127";
};
zone "0.in-addr.arpa" {
type master;
file "/etc/bind/db.0";
};
zone "255.in-addr.arpa" {
type master;
file "/etc/bind/db.255";
};
include "/etc/bind/internal_zone_list.zones";
};
view "external_network"
{
match-clients {!localhost; any; };
recursion no;
// prime the server with knowledge of the root servers
zone "." {
type hint;
file "/etc/bind/db.root";
};
include "/etc/bind/zone_list.zones";
};
/etc/bind/zone_list.zones and
/etc/bind/internal_zone_list.zones
are our zone list files. Both of them contain zone entries for cool.tld. and super.tld. . One specifies the zone files for the external side of the split DNS, /etc/bind/cool.tld.db and /etc/bind/super.tld.db, and other for the internal side, /etc/bind/internal_cool.tld.db and /etc/bind/internal_super.tld.db . To avoid redundancy, we don't want to have to manualy edit both of these files where a new zone is added, so we maintain a common file zone_list_file
cool.tld
super.tld
and we use a Makefile and a python script (make_zone_file) to autogenerate zone_list.zones and internal_zone_list.zones from zone_list_file.
# Makefile
ZONE_LIST=zone_list_file
ZONE_FILE_SUFFIX=".db"
all: zone_list.zones internal_zone_list.zones
zone_list.zones: $(ZONE_LIST) Makefile
./make_zone_list --prefix "/etc/bind/" \
--suffix $(ZONE_FILE_SUFFIX) $^ > $@
internal_zone_list.zones: $(ZONE_LIST) Makefile
./make_zone_list --prefix "/etc/bind/internal_" \
--suffix $(ZONE_FILE_SUFFIX) $^ > $@
make_zone_list
#!/usr/bin/env python
from optparse import OptionParser
from sys import stdout
option_parser = OptionParser()
option_parser.add_option("-p", "--prefix", default="")
option_parser.add_option("-s", "--suffix", default="")
(options, args) = option_parser.parse_args()
def iterjoin(join_str, iterable):
first = True
for value in iterable:
if not first:
yield join_str
else:
first = False
yield value
if len(args) > 0:
input_file = file(args[0])
stdout.writelines(
iterjoin("\n",
("""zone "%(zone_name)s" {
\tfile "%(file_prefix)s%(zone_name)s%(file_suffix)s";
\ttype master;
};
""" % {'zone_name': line.strip(),
'file_prefix': options.prefix,
'file_suffix': options.suffix, }
for line in input_file
if len(line.strip()) > 0 ) ) )
input_file.close()
else:
exit(1)
You end up with autogenerated zone entries like:
zone "cool.tld" {
file "/etc/bind/cool.tld.db";
type master;
};
We've thought through it, and we already know that all of our zones are going to have a common set of SOA, NS, CNAME, and MX records, and a common TTL for all records, so we create common_TTL_SOA_NS_CNAME_MX_for_cool_zones.db
; default TTL
$TTL 3h
; common SOA
@ IN SOA cool.tld. domains.cool.tld. (
2008081401 ; serial, todays date + todays serial
3H ; slave refresh frequency
15M ; slave retry rate when refresh fails
4W ; expire time until slaves give up on refresh
2D ) ; minimum-TTL if one isn't specified
; common NS
@ NS cool.tld.
; common CNAME
www CNAME @
; common MX
@ MX 10 cool.tld.
Different view
Now we start getting into the differences between zones. We want to have different A records for the internal view of our split DNS compared to the external view. So, we define common_TTL_SOA_NS_CNAME_MX_A_for_cool_zones.db :
and common_TTL_SOA_NS_CNAME_MX_A_for_internal_cool_zones.db
$INCLUDE "/etc/bind/common_TTL_SOA_NS_CNAME_MX_for_cool_zones.db";
@ A 192.168.1.186
$INCLUDE "/etc/bind/common_TTL_SOA_NS_CNAME_MX_for_cool_zones.db";
@ A 127.0.0.1
With those files in place, we don't even need real files zone files for cool.tld. and super.tld, we could simply create symlinks from common_TTL_SOA_NS_CNAME_MX_A_for_cool_zones.db to cool.tld.db and super.tld.db and from common_TTL_SOA_NS_CNAME_MX_A_for_internal_cool_zones.db to internal_cool.tld.db and internal_super.tld.db . Now we can query the system for (cool.tld., SOA), (super.tld., SOA), (cool.tld., NS), (super.tld., NS), (www.cool.tld., CNAME), (www.super.tld., CNAME), (cool.tld., MX), and (super.tld, MX).
More, zones!
If we want to add another domain name (new.tld.), it is a few simple steps
- Add it to zone_list_file
- Run make to regenerate zone_list.zones and internal_zone_list.zones
- Add a symlink from common_TTL_SOA_NS_CNAME_MX_A_for_cool_zones.db to new.tld.db and common_TTL_SOA_NS_CNAME_MX_A_for_internal_cool_zones.db to internal_new.tld.db
- Reload nameserver
More subdomains!
Time for another change, we want to add more subdomains to cool.tld, but not have them apply to super.tld. The cool.tld.db and internal_cool.tld.db zone files are thus now different, they can no longer be sym links, so we make real files
cool_extra_sub_domains.db
pics CNAME @
chat CNAME @
cool.tld.db
$INCLUDE "/etc/bind/common_TTL_SOA_NS_CNAME_MX_A_for_cool_zones.db";
$INCLUDE "/etc/bind/cool_extra_sub_domains.db";
internal_cool.tld.db
$INCLUDE "/etc/bind/common_TTL_SOA_NS_CNAME_MX_A_for_internal_cool_zones.db";
$INCLUDE "/etc/bind/cool_extra_sub_domains.db";
As a result, we now have (pics.cool.tld., CNAME), and (chat.cool.tld, CNAME), and we have them in both the internal and external view of the cool.tld. zone. How much work is there if we want one more subdomain? Just add it to cool_extra_sub_domains.db and again, both zones with have it.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.
2008-08-14
Be careful with automatic remote backups!
I used to think that doing automatic offsite backups with rsync and ssh was just great. It was easy to setup, and easy to restore from. But, there are two major problems. First: it gives you a single point backup; which isn't good if your data is mangled and the backup process runs after that. If you dissallow --delete (by strictling defining the rsync command being run on the remote side via an ssh authorized_key file), you won't suffer from files being lost, but you will discover the meaning of pain if your files are written over with bad data prior to the backup taking place.
Don't think that this can not happen. If you suffer a major security comprimise, an attacker might do this, and time the act right before the backup takes place.
An improvement on using rsync is rdiff-backup . It can store multiple points in time, which is really cool. But, this is where problem number two emerges, an attacker can remove and/or taint your files, invoke your automatic backup software and request that old backups on the remote side be removed. With rdiff-backup you can do this with the --remove-older-than flag. To prevent this, it is vital that you restrict your backup software (on the remote side) to only accept new increments. rdiff-backup makes this possible via "--restrict-update-only /backup_directory".
If someone attempts to call rdiff-backup with --remove-older-than when this is true, it will fail with gusto:
Exception '
Warning Security Violation!
Bad request for function: fs_abilities.single_set_globals with arguments: [
' raised of class '
To prevent your backups from growing, you will want to use --remove-older-than at some point. Instead of allowing it all the time, you should create a script on the remote side like
#!/bin/sh
rdiff-backup --remove-older-than 30d /backup_directory
and give permission for the backup source to run that script.
In a world where we have security vulnerbilities like this and this you should be that paranoid.
I suspect that most backup systems based on portable hard disks are unsafe in the way being discussed here. Systems based on tape and/or re-writable discs are also at risk if the same disc or tape is re-used frequently.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.2008-07-05
Keeping up with the Joneses svn repo
“Branches are cheap” is one mantra of version control systems. One criticism of centralized version control systems (like subversion) is that they limit who can make these wonderful cheap branches. Commit access to the repositories of projects' is often not given out generously, trust has to be earned.
I have found this inconvenient while trying to get a complicated and long developed patch developed for submission to an upstream project, especially where multiple re-submissions were required. Every attempt at submission must be applicable and tested against the latest revision of “trunk”, the main development branch, which is constantly changing.
The situation would be even harder to manage if you're working on some change that is never going to make it upstream (for various possible reasons...), but you don't want to fork the whole project, you only want to maintain a “patch” that can always be applied to the latest and greatest from the upstream project.
I always figured that there would be a creative way to use subversion to help solve this. It has been burning my mind for months now, and I finally found a way yesterday.
The key to doing this is the fact that svn merge allows you to apply commits or a series of commits from someone else's repository to your own.
Getting started
Setup your own repository, check out a working copy, and export the part or parts of upstream that you want to mirror.
$ svn export http://project-svn-url/trunk trunk
Carefully note the revision number that you have just exported from upstream.
Add that to your repository and put in your commit log what upstream revision you pulled.
$ svn add trunk; svn commit -m 'exported revision 38242 from http://project-svn-url/trunk and committed it here'
Now you can make your own branch. If you don't already have a directory for branches, make it$ svn mkdir branches
and then
$ svn cp trunk branches/my-branch; svn commit
Enjoy your branch
All right! Now you can start mucking around in your branch. As a bad example, imagine that as an non-official member of the development team, you feel you should finally get credit where credit is due...
$ cd branches/my-branch
$ echo 'The Green Lantern always complains on the -devel mailing list about features he is not willing to implement or bugs he is not willing to fix!' >> AUTHORS
$ svn commit
Now you can easily produce a patch between your branch and your copy of the upstream trunk
$ svn diff svn+ssh://myreposurl/trunk svn+ssh://myreposurl/branches/mybranch
Be sure to keep up with the Joneses
But, don't submit that patch! If you've been working on your branch for days, chances are that things have changed upstream. We should make sure our patch is now based on the latest thing availible upstream. Go back to your mirror of upstream trunk, and look at your commit logs to see what was the last revision pulled from upstream
$ cd ../../trunk
$ svn -v log
.
.
exported revision 38242 from http://project-svn-url/trunk and committed it here
.
.
Now, we take a peek and see what the latest commited revision upstream is
$ svn info -r HEAD http://project-svn-url/trunk
That may tell you that the latest revision is for example 38245.
Now this is the cool part, we pull the changes that have been made to trunk from upstream since we last pulled them
$ svn merge -r 38242:38245 http://project-svn-url/trunk
What this does is apply changes that have been made upstream between the last revision you have mirrored (38242 in this example) and the latest revision upstream (38245), and these are applied to your working copy. When you commit, you should note the new revision again in your commit log.
$ svn commit
It is not advisable to do the merge as follows
svn merge -r 38242:HEAD http://project-svn-url/trunk
and look up the HEAD revision right after that
$ svn info -r HEAD http://project-svn-url/trunk
There is a chance that a new commit could be made upstream, and you might end up putting in your log a revision number later than what you actually synced with. The next time you do svn merge with this number you will not miss a change.
Unfortunately, svn merge isn't perfect at this. If there are new files added upstream, you'll get an error like
svn: Copyfrom-url 'http://project-svn-url/trunk/new_file' has different repository root than 'svn+ssh://myreposurl/'
You can fix this by adding the missing files
$ touch new_file; svn add new_file
and trying again
$ svn merge -r 38242:38245 http://project-svn-url/trunk
This also won't work well if svn merge is using a different access method than you used to pull your own repository from. (as is the case in my examples). For those cases, use
$ svn diff -r 38242:38245 http://project-svn-url/trunk | patch -E -p0
and note the files you need to svn add
$ svn status | grep '^?' | awk '{print $2;};' | xargs --max-args=1 svn add
and the files you need to remove
$ svn status | grep '^!' | awk '{print $2;};' | xargs --max-args=1 svn rm
Now that our mirror of upstream is updated, we can use it to update our branch. Look back in your logs to see when you last copied your mirror of trunk (or changes from your mirror of trunk) into your branch
$ cd ../; svn -v log
So, if we assume that the last revision in your repository where you copied trunk (or changes from trunk) to branches/my-branch is 2, you can apply those changes to your branch, and commit them
$ cd branches/my-branch
$ svn merge -r 2:HEAD ../../trunk/
$ svn commit
Now that your branch is nice and synced up with your mirror of the trunk from upstream, you can create a patch once again between your branch and your trunk mirror
$ svn diff svn+ssh://myreposurl/trunk svn+ssh://myreposurl/branches/mybranch
Of course you can't just send that in. You'll need to compile, run, and test your recently syned up branch again to be sure that your changes still work. It helps to note and be aware of what upstream is changing when you pull those changes down.
You may discover when you do the merge of these changes to your branch that there are incompatible changes that svn can't merge for you. You'll have to fix these, and tell subversion you've done so with svn resolve before you can commit.
Be sure to Keep up with the Joneses often to minimize headaches.
Take a step back
If you try to test the above steps on your favourite project, you may be disappointed when you get to the really cool svn merge step, as there may not have been a new commit upstream that you can merge in since you svn exported it. To avoid disappointment and having to wait, use svn log http://project-svn-url/trunk to find a revision older than HEAD and use -r to pull that one when you do your svn export
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.2007-07-21
Fun with Linux Routing
I have a typical residential broadband internet connection that only comes with one ip address. There is a residential router running OpenWrt connected to the broadband modem. It's running a dhcp server that provides private ip addresses (192.168.1.0/255.255.255.0) to both the wired and wireless network. It performs network address translation so these private addresses can access the internet.
I have laptop with both a built in ethernet card and a wireless card. I have desktop computers with ethernet cards that I would like run inside my bedroom, where I don't have a wired connection. My idea is to put my laptop in my room, connect the desktop computers to it through a small ethernet network, and allow the desktop computers to connect through the laptop's wireless card to the rest of the network. (and by extension, the internet)
Not only would I like the desktop computers to be able to connect out through the laptop, but I would like the rest of the house network to be able to communicate with them.
One possibility is to run network address translation inside the laptop. That should make you squirm; one layer of network address translation is a necessary evil, but two layers is... just wrong.
Not only would I have to set up network address translation rules in the laptop to translate addresses going from the bedroom out, but from the rest of the house in. For incoming traffic, the laptop and the desktop computers would have share the ports. A totally unnecessary evil, so network address translation is out.
I could also use ethernet bridging. This would actually work quite well. Because 802.11b is really just ethernet in the air, the laptop can just pass ethernet packets through itself, just as a switch would. The desktop computers would even be able make dhcp requests that the house's router could respond too. All the bridge/laptop would really be doing is extending the house's local area network (LAN) into the bedroom.
Even though the above was a potentially elegant solution, I had my heart set on doing this a different way. Consider, how could I achieve my goals if I wasn't using ethernet to connect my laptop to the desktop computers? What if I wanted to connect them with something else like SLIP? Making the bedroom and rest of the house one network with bridging would be impossible, because you can't bridge two different types of network.
So, what I'm really trying to do was connect two different local area networks (LANs) together. How do we do this? The same way the internet does it! The answer is ROUTING.
I mean routing in the purest sense here. As I said earlier, I wanted no network address translation garbage. The goal: a local area network (LAN) with private addresses in my bedroom (10.0.0.0/255.255.255.0) being able to interact with the local area network of my house (192.168.1.0/255.255.255.0) in both directions, with my laptop connecting both networks via standard internet protocol routing.
First, the laptop. I'm running Debian, so I setup the networking by editing /etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 10.0.0.1
netmask 255.255.255.0
auto wlan0
iface wlan0 inet dhcp
I also need to enable ipv4 routing by setting ip_forward=yes in /etc/network/options. When I restart networking, (/etc/init.d/networking restart), this causes 1 to be written to /proc/sys/net/ipv4/ip_forward .
After restarting networking, here is my laptop's route table
# /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 * 255.255.255.0 U 0 0 0 eth0
192.168.1.0 * 255.255.255.0 U 0 0 0 wlan0
default router.lan 0.0.0.0 UG 0 0 0 wlan0
This says that any traffic destined for 10.0.0.0/255.255.255.0 (bedroom LAN) should go out on my ethernet card, an traffic destined for 192.168.1.0/255.255.255.0 (the house LAN) should go out my wifi card, and anything else should go out my wifi card and be routed through the house's router. (router.lan 192.168.1.1).
Next I setup the bedroom LAN. I'm running CentOS on one of the desktop computers. I used /usr/sbin/netconfig to setup the networking. The result ended up in:
/etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.0.0.5
NETMASK=255.255.255.0
GATEWAY=10.0.0.1
I also told it to resolve dns with the DNS server in the house router, by putting nameserver 192.168.1.1 in /etc/resolv.conf .
Here are the interesting parts of the route table
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 * 255.255.255.0 U 0 0 0 eth0
default 10.0.0.1 0.0.0.0 UG 0 0 0 eth0
Traffic destined for the bedroom LAN (10.0.0.0/255.255.255.0) should be transmitted with the ethernet card. Traffic destined for anywhere else (default) should be routed through my laptop (10.0.0.1) via the ethernet card.
One last step. My laptop (puritanwifi - 192.168.1.137) will pass on traffic from 10.0.0.0/255.255.255.0 to the house's router (router.lan 192.168.1.1), but the house's router has to know how to send things back.
So, I logged into it, and added a route
# route add -net 10.0.0.0 gw 192.168.1.137 netmask 255.255.255.0 br0
Here are the interesting parts of the route table in the house's router
# /sbin/route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.0.0.0 puritanwifi 255.255.255.0 UG 0 0 0 br0 192.168.1.0 * 255.255.255.0 U 0 0 0 br0 default wnpgmb02dc1.mts 0.0.0.0 UG 0 0 0 ppp0Traffic going to the house network (192.168.1.0) goes out to br0, which bridges the wired and wireless network into one. Traffic going to the bedroom network (10.0.0.0/255.255.255.0) goes out on br0 to be routed to that network by my laptop (puritanwifi), and everything else goes out to my internet access provider.
The other computers on the house network can now talk to the bedroom network too. They have a routing table like this
$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.1.0 * 255.255.255.0 U 0 0 0 eth0
default router.lan 0.0.0.0 UG 0 0 0 eth0
It says, put traffic out on the ethernet interface if it's destined for the house network (192.168.1.0/255.255.255.0), otherwise use the default route and use the house router (router.lan 192.168.1.1).
Notice that nothing is said about the bedroom private network. But, thanks to the default route, and the particular gateway being used there, I can reach the bedroom network:
$ traceroute 10.0.0.5
traceroute to 10.0.0.5 (10.0.0.5), 30 hops max, 40 byte packets
1 router.lan (192.168.1.1) 1.083 ms 0.791 ms 0.781 ms
2 puritanwifi.lan (192.168.1.137) 5.046 ms 3.218 ms 2.559 ms
3 10.0.0.5 (10.0.0.5) 3.009 ms 3.218 ms 2.710 ms
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.2007-02-09
Iterative, functional definition of the fibonacci sequence in python
#!/usr/bin/evn python
# An iterative, functional definition of the fibonacci sequence using reduce()
# This is how I originally wrote this.
# It turns out that lambda is going to be removed from python 3,
# and map is going to be removed from the builtin namespace. So the newer
# version below doesn't use them.
#
# Advocates for removing lambda would consider this to be a good example
# of why it should be removed.
#
#def fibo(n):
# """Find the nth fibonacci number
# """
# return reduce( lambda (a, b), count: (b, a+b), xrange(n), (0, 1) )[0]
#
#print map( fibo, xrange(20) )
def fibo_switch((a, b), count):
"""Take a tuple representing the current two numbers of interest in the
fibonacci sequence and produce a new tuple that advances the sequence.
We make the first number the current last, and the last number the
sum of the current first and last.
"""
return (b, a+b)
def fibo(n):
"""Find the nth fibonacci number
"""
return reduce( fibo_switch, xrange(n), (0, 1) )[0]
# Print a list of the first 20 fibonacci numbers
print [ fibo(n) for n in xrange(20) ]
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.2007-01-29
Overkill on Deal or No Deal
Tonight I was at my parents house, and they were talking about the United States version of Deal or No Deal. At first I was not interested, but the more I learned about it, the more I wanted to see it, as I could see it there was game theory involved.
In short, I had a brief love affair with the Howie Mandel problem.
At no point was I interested in the kinds of perspectives I knew serious economists and statisticians would take, perspectives that include ideas of risk aversion and existing wealth. I was only enamoured with the "best strategy".
On the bus home I realized that this would only be a matter of finding the expected value and comparing it to the bank/producer's offer. If the offer is less then the expected value, no deal. If the offer is greater then or equal to the expected value, deal.
So I started with a base case, n=2 (number of boxes remaining), r=1 (removals to make). When n=2 the expected value is the arithmetic mean (average) of those two values.
Next, a recursive case. There are n boxes, and r to be removed. Average the expected values of all the equally probable n choose r (nCr) sub cases. (There are nCr ways to remove r boxes from a set of n boxes)
Here's an n=3, r=1 example for the recursive case. x1=100, x2=200, x3=300. We find the average of the expected values for (x1, x2), (x1, x3), and (x2, x3), all with r=1. That's (150+200+250)/3 = 200.
I was already anticipating what I'd do with bigger values of n and r. Get a computer to do the work. So I did so with some ugly python code. (that shows off some nice features of python)
def pick_n_boxen(boxen, n):
if n==1:
for box in boxen:
one_box_set = set()
one_box_set.add(box)
yield one_box_set
elif n == len(boxen):
yield set(boxen)
else:
boxen_so_far = set()
for box in boxen:
one_box_set = set()
one_box_set.add(box)
boxen_so_far.add(box)
for subset in pick_n_boxen( boxen.difference(boxen_so_far), n-1):
assert( subset != None )
yield subset.union(one_box_set)
def remove_decrement(remove):
remove -= 1
if remove < 1:
remove = 1
return remove
def average(iteration):
avg_sum = 0
count = 0
for value in iteration:
count+=1
avg_sum+=value
if count>0:
return avg_sum / float(count)
else:
assert(False)
expected_outcome_cache = {}
def expected_outcome(boxen, remove):
boxen_list = list(boxen)
boxen_list.sort()
cache_key = (tuple(boxen_list), remove)
if expected_outcome_cache.has_key( cache_key ):
return expected_outcome_cache[cache_key]
else:
if len(boxen) == 1:
result = tuple(boxen)[0]
else:
next_remove = remove_decrement(remove)
if next_remove >= len(boxen) :
next_remove = 1
if remove >= len(boxen):
remove = 1
result = average( expected_outcome(boxen.difference(remove_set),
next_remove)
for remove_set in pick_n_boxen(boxen, remove) )
result = int(round(result))
expected_outcome_cache[cache_key] = result
return result
I knew that this would take too long too run at early points in the game, but I was content that it would work well later in the game.
Happy that I had "solved" the problem independently, I went out to the net to see the rest of the world was thinking. I didn't even have to leave Wikipedia to realize my mistake. You can calculate the expected value by just averaging the values, for any size of n, not just 2. Oh, and r doesn't matter!
My Mom suggested this a few times during the heated post super conversations about this. My Dad and I rejected this approach as too intuitive. We'd been exposed to so much probability theory in school that we just couldn't accept that it could be that easy! Our profs taught us that it was really easy to misapply probability theory and come up with wrong solutions. "You must be cautious!" I personally remembered how the Monty Hall Problem had an unintuitive solution, and I thought Howie Mandel would also require the same kind of detailed case analysis. It turns out not to be.
What have I learned? I haven't unlearned the valuable lessons that probability theory is easy to misapply and requires great understanding and caution to apply correctly, but I have also been reminded that being "too academic" can lead to solutions that are correct, but overkill.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
This work is licensed under a Creative Commons Licence.2006-12-30
tc, why won't you obey me!
I've been playing with this on a router running OpenWrt.
There has been some frustration. It took me several days to realize why I was getting errors like this:
# tc qdisc replace dev ppp0 root tbf rate 220kbit latency 50ms burst 1540
RTNETLINK answers: Invalid argument
Today I understood the problem. I never loaded the relevant module:
# insmod sch_tbf
In addition to the man pages, I've found Linux Advanced Routing & Traffic Control HOWTO and ADSL Bandwidth Management HOWTO to be helpful. These are under free documentation licenses.
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
2006-12-06
Get an intro MIT CS education
MIT wisely held onto the copyright instead of giving it to HP. Years later, OpenCourseWare became policy and the videos were liberated for all to enjoy. They went with CC-ShareAlike as a license too!
You can also read the textbook (not under a free culture license .`( ) and access other course materials.
I've watched up to Lecture 4a so far. Cool stuff. It's nice to be able to pause the lecture to better process a slide, or even go back when you space out.
The subject matter is still highly relevant, it's not at all dated. This makes me wonder, have the ideas for handling complexity in computer systems actually advanced in the past 20 years, or do we just have better tools for expressing those same ideas?
(I'm striking the above paragraph, it appeared in the first public publication of this entry, but I now feel it's inapropriate. Leaving it in for transparency. Even though I don't draw any conclusions in the second sentance, the combination of the first sentance and the second pushes the reader towards one. I'm striking because I don't feel confortable with this conclusion yet, I need to learn more. )
Math haters may not enjoy though. The subject matter isn't itself mathematical, but the examples used to illustrate the ideas often are. You may not be able to appreciate the message due to the medium. It's mostly first year university calculas and vector geometry so far.
One feature is the lack of cruft found in real university lectures. There are no questions about assignments, complaints about assignments, requests for deadline extension, returning of assignments, followed by complaints about assignment marking.
The The Jargon File calls the textbook the Wizard Book. Perhaps these videos could become known as the Wizard Videos.
The theme music for the videos is Jesu Joy of Man's Desiring
Mark Jenkins is a member/worker/owner of ParIT Worker Co-operative.
