I was reading through some code (this file from the REBOUND N-body simulator) when I came across this function:

// Machine independent implementation of pow(*,1./7.)
static double sqrt7(double a){
    double x = 1.;
    for (int k=0; k<20;k++){  // A smaller number should be ok too.
        double x6 = x*x*x*x*x*x;
        x += (a/x6-x)/7.;
    return x;

and I got very curious as to how it worked, and why one would need it and mostly about whether it would generalize to arbitrary values of base and exponent.

We see that we have an initial guess (x) which we then increment using the curious factor \frac{1}{7}(\frac{a}{x^6} - x) = \frac{a - x^7}{7x^6}. We can see that when x approaches a^{\frac{1}{7}} this factor vanishes and x stops moving.

But why the other components?

Untitled drawing

If we create a function f = x^7 - a we see that it goes to zero when x = a^{\frac{1}{7}} (The point marked out in the sketch. Now, we know that the differential of the curve (f' = 7x^6 = \frac{\Delta f}{\Delta x}) gives us the slope of the curve at x. At x we are \Delta f = x^7 - a away from our desired value. So, by moving by \Delta x = \frac{\Delta f}{f'} we get closer to our desired point in a principled way – we move faster when the slope is greater, and slower when it’s smaller (and close to the desired location)

Does this look familiar? Of course it does! It’s gradient descent which they taught you when you were a baby, and now you’ve forgotten it, because who does things like that any more? Formally, it’s called the nth root algorithm and the wikipedia page suggests that you probably only need 6 or so iterations to get 64 bit accuracy.

I got curious about how the pow function was implemented in the C++ standard library. The internet tells me that – it depends. Thrashing about a bit in the cmath code for gcc we see different prototypes for different types of input base and exponent and it looks like the actual function may be implemented in Fortran.

Stackoverflow pointed me to an opensource Apple implementation. The vast bulk of the function is devoted to dealing with edge cases (It’s impressive to read, if a little messy) and there is an interesting algorithm (which uses a lookup table included in the source) which I couldn’t figure out.

On this topic, if your exponent is a positive integer, Exponentiation by squaring is a fun method to compute A^n.

Computing orbits (1)

I want to simulate the orbits of planets and moons in our solar-system. They will form a carousel of gravitational bodies that my little spaceships will lift-off from, orbit around, and slingshot past, on their imaginary journeys of adventure and derring-do.

The wikipedia article on orbit modeling is a nice introduction to some of the concepts involved. Newton’s laws form the basis of computing orbits – every body attracts every other body in proportion to the product of their masses and inverse to the square of the distance between them.

If you imagine ttmp1240_thumbwo bodies (say the Sun and Earth) orbiting in isolation, then an analytic solution (a directly computable equation) to the orbit is possible. It’s a conic section (circle, ellipse or parabola) called a Kepler orbit.

A Kepler orbit, being a nice geometric figure, can be described rather simply with a small set of parameters.

Of course, the Sun and the Earth aren’t the only two bodies involved. Though the Sun’s mass dwarfs all others, planets and moons also pull on each other causing deviations – called perturbations – from the Keplerian orbit, which are evident when you measure carefully enough. There are no analytical solutions to the general problem that considers more than two bodies. Orbit prediction (also called propagation) is done using a variety of methods with the complexity dependent on the level of accuracy needed.

The most direct way that comes to mind – which is what I have used in previous code – is formally called an N-body integrator. Here, if you have N bodies in the simulation you simply compute the force F between every pair of bodies, vector sum the resultant force for each body, find the accelerations, and then use numerical integration to update the velocities and positions. I’ll call this the N-body method. The basic principle is straightforward, but there are important details to do with the numerical integration itself.

NASA’s HORIZONS system (pdf introduction) is the goto place for ephemeris data (ephemeris means “diary” in Latin and Greek and is the historical name for tables of positions of astronomical objects).  The whole HORIZONS system has a quaint 90s scientific computer system feel to it. There are three access methods – via web, via email and via telnet. Yes, telnet. I haven’t heard that word in a decade. I was pretty surprised when I found it still existed on my mac.

The web interface is the nod to “modernity”. It allows you to experiment with queries and familiarize yourself with the system.  The web interface is hard to scale, and I started to look for a Python based programatic interface to the HORIZONS system. My first hit was callhorizons. Callhorizons seems to have been created with a lot of love but it returns data in a somewhat specialized format. Scanning through the code and poking a little around the batch web interface it turns out it’s quite simple to create a programatic interface to query the system. I’ve put some barebones Python code to query the HORIZONS system here.

So, another method to determine orbits of planets for the simulation would be to get ephemeris data (I had to get used to that term too) from NASA’s Horizons system and store it as a look-up table. In this case the planetary orbits are completely predetermined and given with a high degree of fidelity. I’ll call this the look-up table method.

After I started doing my research I came across the paper: Keplerian Elements for Approximate Positions of the Major Planets. Here the author takes high accuracy data over a certain time period (epoch) from NASA and fits Keplerian orbits to the data – Keplerian orbits are ellipses and are completely defined by a small number of parameters (called elements). One neat thing the author does is to include delta terms that change the Keplerian elements with time – our ellipses precess round slowly – and this more or less captures the perturbation of the planetary orbits due to the presence of other planets. I’ll call this the “keplerian fit” approach.

My inclination is to start with the N-body simulation approach which is what we’ll explore in the next post.

The marvelous bee odometer

You probably know that not only can bees compute the vector (direction and distance) to a discovered food source relative to their hive, but they can also convey this vector to their hive mates. Here, I’ll talk a little bit about one component of this system – the Bee odometer: how do bees figure out the distance to the food.

Bees are amazing. In general, insects are amazing. If you are into robotics of any kind, you should definitely be studying the insect literature. Bees can not only perform vector summation to compute the bee-line from their hive to a source of food they have discovered, but they can convey this vector to their hive mates via a form of sign-language!

Bees use a dance to instruct hivemates as to the direction and distance of a food source. They make circular and figure eight movements on the vertical surface of the hive. They waggle during the dance and the intensity of the waggling conveys the distance. The orientation of the axis of the dance conveys direction)

How they compute the direction is just as marvelous as to how they compute the distance, but I will just talk about distance here. There are two competing hypotheses as to how they keep track of how far they have flown. One is that they track how much energy they have consumed while flying to the food, which correlates with the distance. This hypothesis has a nice ecological feel to it because energy (food) is the main reason they are flying out in the first place, and we can imagine that there are decisions to be made based on how much energy it takes to get to a food source.

The competing hypothesis is that bees use optic flow – the percept of visual motion from their environment – to determine how far they’ve flown. This hypothesis is way cool.

Like in most such scientific debates, a small cottage industry grew around this question, sustaining several scientific families for most of their adult lives. The basic method for the experiments researchers did was to have bees fly from their hive to a food source and then watch their dances when they got back to the hive. The dances tell us what the bee THOUGHT the distance to the food was, and of course, we can measure the actual distance ourselves and compare. Then various tricks are played on the bees to try and figure out what cues they are using to determine flight distance.

The review by Esch and Burns [1] makes interesting reading in the context of the sociology and politics of science. One experiment supporting the energy hypothesis was done by a fella named Heran. He had bees forage going either uphill or downhill from a hive to a feeder placed at a constant distance. The energy hypothesis predicts that bees going uphill will measure a longer distance than bees flying downhill (even though the actual distance was the same). Apparently out of seven runs of the experiment with this setup, Heran found five that showed the bees didn’t care if they were going uphill or downhill – they just flagged a fixed distance. In just two runs Heran found an answer supporting the energy hypothesis. He explained away the other five results by blaming winds and reported that bees use energy (!)

A scientist called von Frisch was a pioneer in the study of the natural behavior of bees (among other insects). According to Each and Burns [1] in one study von Frisch actually had evidence for an optic flow hypothesis. von Frisch observed that bees flying over water to get to a feeder would signal shorter distances than bees flying over land to get to a feeder at the same distance. The smooth reflective surface of the water offers far less by way of optic flow (image motion) that say a grassy knoll dotted with trees and cows and buttercups. When a bee flies over the water, it’s basically this dark blue featureless sheet which tricks it into thinking it’s not really flying far.

In fact, bees are known to dive low – sometimes too low, ending up drowning – because they can’t sense any motion from the water, and it turns out from later experiments, this sense of motion from the ground is essential to them, and they use it to adjust how far they fly from objects.

But here comes the kicker. He had another set of similar experiments done on a windy day, when the wind happened to blow against the bees flying over the lake and with the bees flying over land. Now this time the bees flying over the lake signaled a longer distance. Was this because they had to work harder or because there was a ton of optic flow from the ripples on the lake that made it look like they were flying a lot? It’s a confound.

The proper action would have been to repeat the experiment on a windless day – like the previous experiment – to avoid this confounding variable, but von Frisch averaged the two results together and found that bees don’t care if they were flying over land or water – they always correctly estimated the distance.

Obviously von Frisch liked the energy hypothesis. And von Frisch was a big cheese. So the energy hypothesis came into favor. But as always the truth will out and people kept investigating and finding things that didn’t quite jive with the energy hypothesis.

Finally, there was a series of elegant experiments that strongly supported the optic flow hypothesis. Scientists placed relatively short tubes at the mouths of feeders. The tubes had different kinds of textures on them, some designed to elicit a lot of optic flow (i.e. very dense textures) while others were almost featureless (generating very little optic flow). When bees had to get to the feeder by passing though highly textured tubes they reported much longer distances than when they flew through sparsely textured tubes [2, 3]. The researchers also did a bunch of fun quantitative analyses to estimate how the bee converted the optic flow it is probably sensing to the distance it thinks it flew.

Now, like in most biological systems, it is unlikely the honey bee is using just this one cue – it’s probably a combination of cues that gives the honey bee it’s final percept of distance, but the experiments suggest that vision, and optic flow specifically are the most heavily used cues by bees to determine how far they’ve flown, as well as to adjust things like how high they fly above a surface and how much distance they will keep from obstacles.


  1. Distance estimation by foraging honeybees. Esch H, Burns J.
    Journal of Experimental Biology 1996 199: 155-162.
  2. Honeybee dances communicate distances measured by optic flow. Esch HE, Zhang S, Srinivasan MV, Tautz J. Nature. 2001 May 31;411(6837):581-3.
  3. Honeybee navigation: nature and calibration of the “odometer”. Srinivasan MV, Zhang S, Altwein M, Tautz J. Science. 2000 Feb 4;287(5454):851-3.

Derived classes and std::vector

I want to talk about a fun corner of C++ I ran into recently. It has to do with how to handle calling member functions of a list of heterogenous objects.

My use case was a bit like the following: say we have a bunch of different functions – a * x^2, b * x, c/x – and we want to apply them one after the other to a series of x values. Which functions are included, and the order of application are user defined.

My solution was to have a base class

and three derived classes

But when I put them in a list and applied them:

I have

g++ --std=c++14 main_wrong.cpp -o test

Whaaaaa? Waah! Why is it calling the dummy base class function. Isn’t the WHOLE PURPOSE of virtual functions to ensure that the function in the derived class gets called?

The core problem here is that std::vector makes a copy of the object when we use the push_back function. We’ve defined the container type as a base class and when our derived class is passed to push_back it “slices” the object down so that the subclass now looks like the base class – losing all it’s derived attributes. (I first ran into Object slicing thanks to “Cubbi”‘s answer here)

So what can we do? We could use a vector of pointers instead. But the internet seems to think this is an unsafe idea. UNSAFE!? We don’t write C++ to be SAFE!! Oh, alright. We’ll practice safe code. What can we do?

The safest route is to use std::unique_ptr:

I’m not full clear on the use of emplace_back instead of push_back: it certainly isn’t for efficiency reasons here – the just code won’t compile using push_back.

So there you have it.

Chromecast audio: command line

When I was casting about (I’m allowed one Dad pun per article. Oops, I mean bad pun. Sorry that was two much fun.)  for home audio solutions (It’s what the kids call speakers these days) a friend of mine encouraged me to try out Google’s Chromecast Audio device. And so another internet connected creature entered our home. This post starts with a rant. Feel free to skip to the section that says “Commandline” …

I was simultaneously curious, impressed, and a little disgusted by the Chromecast Audio. We got the Chromecast at the same time as the Synology NAS did and I found the two to be a study in contrast.

Synology gives you an appliance – it’s a snap to setup, to update, and to add functionality (“Apps”) to. This is great, but even greater is that you get root access – it’s a Linux powered box that you own and have control over – you can de-Appliance it.

The Chromecast Audio is a modern appliance. It’s a black box that has very little to configure (good) and it’s locked down in a weird way (very bad). It’s an internet connected device that updates itself and there is no way to access what’s inside. Thankfully, it does not have a camera or a microphone. Google does release some sort of source code. It’s a 4GB tar file. It’s not clear if this is the whole software the box is running, and if it’s the latest version. So there’s that.

One major symptom of this locked-down-ness (and probably of Google’s marketing prowess) is just how hard it was to find some useful information of WHAT the damn thing was doing. I’d always come across functional or how-to articles that told me to use this App or that App, or use the Chrome browser and use cast.

Finally, and ironically, the best description came from Synology’s blog:

Chromecast is basically a Chrome browser, you can’t see it, you can’t feel it, but it’s there: when you select a song to stream, DS audio/video simply passes the HTTP(S) url to the browser, meaning the stream goes from the DiskStation to the Chromecast, the device is never a bottleneck. And if you kill the app or log out, streaming will carry on uninterrupted!

So, here’s where I got disgusted – the chromecast is really built around smart phones or tablet devices that run “Apps” and really pushes you to Google’s ecosystem. It was very easy to configure the Chromecast via my Chrome browser, but that implies … that I was running a Chrome browser. Suppose I like Firefox or some other browser?

Next, streaming to the device requires a smart phone App. The only way to control the device from a Mac is to open up … Chrome and then use this janky thing called Cast. It’s not clear if it actually passes on the URL to the Chromecast (as it claims) or mirrors your computer output (Tab casting) – I got very glitchy playback when I tried to cast Youtube videos from my mac. It also messes up my sound output, to where I have to go into settings to restore things.

What’s with this App only stuff? Suppose I don’t have a smart phone (I actually don’t – we do exist.) How do I control the player? Just give me a small program to communicate with the device! You don’t have to go overboard and give me a fantastic browser based virtual desktop like Synology does. I don’t expect Google to have the software chops of a relatively small Taiwanese hardware company, but even a simple command line player would make me happy …

I like the hardware. I like the miniaturization. I like the ease of setup. I hate the ecosystem. It reeks of a desperation to make money that is unseemly.


(My original plan was to see if I could outline here a bare bones Python program that commands the Chromecast to stream files or URLs of your choosing from basically any platform. After looking into the details of the chromecast protocol, I don’t believe I have the time to develop something from the ground up. Instead, we’ll use the wonderful pychromecast library from Paulus Schoutsen a little bit. But, we’ll start off with some experimentation, anyway)

First we are going to have to find our Chromecast device. Chromecasts use the mDNS protocol [a][b] and listen to the address _googlecast._tcp.local. We can use the zeroconf module (pip installable) to find the service.

import time
from zeroconf import ServiceBrowser, Zeroconf

class MyListener(object):
  def add_service(self, zeroconf, type, name):
    info = zeroconf.get_service_info(type, name)
    print("Service {} added, service info: {}\n".format(name, info))

zeroconf = Zeroconf()
listener = MyListener()
browser = ServiceBrowser(zeroconf, "_googlecast._tcp.local.", listener)

(It takes a little time for the handshaking to complete, so it’s best to add a delay before you shutdown zeroconf – hence the sleep)

If I run this, what do I get?

Service Chromecast-Audio-XXX._googlecast._tcp.local. added, 
service info: ServiceInfo(
  b've': b'05', 
  b'md': b'Chromecast Audio', 
  b'rm': False, 
  b'st': b'0', 
  b'bs': b'......', 
  b'ic': b'/setup/icon.png', 
  b'ca': b'2052', 
  b'nf': b'1', 
  b'id': b'XXX', 
  b'fn': b'kitchen', 
  b'rs': False})

Whoa! Look at that! Some things I can figure out, like an IP address, a port and the name I gave the device (“kitchen”).

So, this IP address, can we ping it?

MacBook-Pro:~ kghose$ ping XXX.local
PING XXX.local ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=7.929 ms
64 bytes from icmp_seq=1 ttl=64 time=4.980 ms
64 bytes from icmp_seq=2 ttl=64 time=4.923 ms

Heh, Heh, heh. It’s ALIVE!

MacBook-Pro:~ kghose$ ssh XXX.local
ssh: connect to host XXX.local port 22: Connection refused

Ok, too much to ask for.

Can we port scan it? (I was lazy and used MacOSs built in Network Utility from a tip here)

Port Scan has started...

Port Scanning host:

     Open TCP Port:     8008        http-alt
     Open TCP Port:     8009
     Open TCP Port:     9000        cslistener
     Open TCP Port:     10001       scp-config
Port Scan has completed...

Not much open here – bodes well for security, I suppose. Note that the un-named port 8009 is the one the Chromecast returned during service discovery.

When I started out I thought that the Chromecast would have some kind of straightforward RESTful API and I looked forward to playing with it, but Google opted for a more unfriendly custom interface sitting just on top of TCP.

Fortunately, the excellent pychromecast library has done the tedious work of integrating things together to provide a Pythonic API to the chrome cast device.

(As a warning, just after I ran the following code and simply adjusted the volume on my Chromecast, it started misbehaving with the Apps on our phone. I had to reboot the Chromecast to get everything back to normal again. It could have been a coincidence, though)

import pychromecast

chromecasts = pychromecast.get_chromecasts()

[cc.device.friendly_name for cc in chromecasts]
#-> ['kitchen']

cast = next(cc for cc in chromecasts if cc.device.friendly_name == 'kitchen')
#-> CastStatus(is_active_input=None, is_stand_by=None, volume_level=0.25, volume_muted=False, app_id='ZZZ', display_name='Google Play Music', namespaces=['urn:x-cast:com.google.cast.broadcast', 'urn:x-cast:com.google.cast.media', 'urn:x-cast:com.google.cast.cac', 'urn:x-cast:com.google.android.music.cloudqueue'], session_id='YYY', transport_id='YYY', status_text='Google Play Music')

The only “write” operation I tried was to change the volume on the Chromecast, but as I mentioned, after this, the Chromecast had to be rebooted.

#-> 0.35

#-> 0.4499999940395355

#-> 0.4499999940395355

#-> 0.4499999940395355

#-> 0.4499999940395355

#-> 0.4499999940395355

#-> 0.4499999940395355

#-> 0.4499999940395355

#-> 0.549999988079071

#-> 0.6500000119209289

#-> 0.6500000119209289

#-> 0.6500000119209289

#-> 0.6500000119209289

#-> 0.5500000357627869

#-> 0.5500000357627869

So, I didn’t get further than that – I was put off because it seemed that controlling the Chromecast from this source borked the connection between the phone and the Chromecast for good i.e. until I rebooted it, but it SEEMS straightforward to have a little command line client to send music to the Chromecast using this library. We shall see.

[1] Low level description of Chromecast V2 protocol
[2] Python library to communicate with Chromecast by Balloob
[3] Chromecast interface description (incomplete)
[4] Protocol buffer introduction


functools.partial is one of those things that you don’t need until you need it, and then you need it badly. Trying to create a series of lambda functions I discovered that the simplest answer that came to my mind did something counter intuitive, and functools.partial – which I had ignored so far, came to my rescue.

Effectively, I wanted to create a list of functions

f1(x), f2(x), f3(x) ...

where each function took the same input but did something different, for example

f1(x) = x + 1, f2(x) = x + 2, f3(x) = x + 3

What I tried was as follows

fl = [lambda x: x + i for i in range(4)]

It ran fine, but it gave me the following output

fl[0](1) -> 4
fl[1](1) -> 4
fl[2](1) -> 4
fl[3](1) -> 4

What was happening was that all the lambda functions were taking the last value of i (as a global), rather than the value of i at the time the function was created.

An object oriented person would probably do something like

class F:
  def __init__(self, i):
    self.i = i
  def f(self, x):
    return x + self.i

fl = [F(i) for i in range(4)]

But I enjoy Python because it borrows a lot of things from LISP and allows you to write a lot of code very succinctly, making it easier to grasp (if done well – it’s also possible to write Perl in Python)

Turns out the way you can capture the state of i in the lambda is to create a partial function, and functools.partial allows you to do this succinctly as

fl = [functools.partial(lambda x, j: x + j, j=i) for i in range(4)]

For those who are curious, WHY I had to do this – well, here is an analogy to my use case: I had data that was well expressed as a table of values. The data itself was stored as a list of dictionary which was sometimes nested – some of the values formed part of a hierarchy. For example, one row of the data could look like this:

row = {
 'name': 'Baker',
 'rank': 22,
 'score': {
   'raw': 43,
   'scaled': 25

I wanted a table with columns

Last Name | Rank | raw score | scaled score |
    Baker |   22 |        43 |            25|

The actual table I was creating was meant to show ALL the data from an experiment with many variables and so had a lot of columns. I was wary of making a mistake in labeling the column relative to the value it had. It would be one of those silent and deadly errors.

If I had a simple flat structure, I might simply have changed the code such that the dictionary keys matched the column headers, weighing the fact that there was other code – already written and debugged – that used the existing keys and that the existing keys were short and succinct and I liked them.

My solution was to create a list of tuples

[(column header, function to extract column value from dictionary)]

And then simply loop over this list, first to create the header, and then for each row, to extract the data value into it’s correct place. Having the column header right next to the extraction function minimized the chances I would mess up somewhere, especially when adding or moving a column when editing or refactoring the code.

Self-Organizing Maps

Self-organizing maps are an old idea (first published in 1989) and take strong inspiration from some empirical neurophysiological observations from that time. The original paper “Self-Organizing Semantic Maps” by Ritter and Kohonen (pdf) has a nice discussion that took me back to some questions I was looking at in another life as a neurophysiologist. The discussion centers around cortical maps, which are most pronounced in the clearly sensory and motor areas of the brain.

Very early experiments (in their paper Lashley‘s work is referenced) led some people to propose the brain was a complete mishmash, with no functional separable internal organization. Each part of the brain did exactly what the other parts did, a bit like how a lens cut in half still works like the original lens (just dimmer), or how a hologram cut in half, still shows the original scene.

Later experiments looked more closely and found a wealth of specialization at all scales in the brain. The cortex (the most evolutionarily recent part of the brain) shows an especially fascinating organization at the smallest scales. This is most apparent in the visual cortex, where the neurons are often organized in two or higher dimensional maps. A famous example is the set of orientation maps in early visual cortex:

nrn3766-b1This figure is taken from Moser et al, Nature Reviews Neuroscience 15,466–481(2014) but you’ll run into such figures every where. It shows physical space along the cortex, color coded for the orientation of bar that the neurons at that location best respond to. One can see that the colors are not randomly assigned but form contiguous regions, indicating that neighboring neurons respond to similar orientations.

I could go on about cortical organization like this (and indeed I have elsewhere) but this is a good segway into self-organizing maps. Researchers studying machine learning got interested in this feature of the brain and wondered if it held some use for classification problems. Kohonen drew up an algorithm for what he called a self-organizing map that has been used on and off mainly for visualization via un-supervised learning.

The Kohonen map is constructed as follows.

  1. Construct a sheet of “neurons” with the dimensionality you want. Usually this is limited to one, two or three dimensions, because we want to visualize a complex higher dimensional dataset.
  2. Each neuron is a template – it has the same dimensions as the input data and can be overlaid on each instance of the input data.
  3. We initialize the templates randomly – so the templates look like white noise.
  4. We “present” each input example to the map. Then we find the template neuron that most closely matches this input. We then morph the template slightly so that it even more closely matches the input. We do this for all the neighbors of the neuron too.
  5. We keep repeating this for all the input we have.

What does this result in? I have two videos for you, you product of the MTV generation, that shows how this works.

I took the famous NIST handwritten digits data set and fed it to a two dimensional SOM.

The first video represents the self-organizing map as a set of templates that change as inputs are supplied to it.

This is easy to interpret as you can identify the templates as they evolve. Note how neurons form neighborhoods that naturally link together similar stimuli. It is easy to see with this data set because the data are two dimensional visual patterns and we can judge similarity intuitively.

The second video represents each neuron’s stimulus preferences, with it’s most preferred stimulus digit appearing as the largest.

You can see of the network evolves as examples are thrown at it and segregates out into zones that respond to or represent a particular form of the stimulus. You can see how zones travel and try to spread-out, ending up at the corners of the map. This is due to the competition created by the winner-take-all operation we use (closest matching template wins) combined with the effect of adjusting the winning neuron’s neighbors too.

SOMs are fun things to play with, but are a little fiddly. You may have picked up from the videos that we had two parameters called sigma and eta that changed as the learning went on. The numbers got smaller as the training went on. Sigma is the size of the neighborhood and eta is the learning rate – the smaller the value of eta the less the neurons change in response to inputs. These are two things you have to fiddle with to get decent results. Also as you can imagine, this whole process is sensitive to the random templates we start with.

Numerous attempts have been made to try and figure out objective ways to determine these numerous parameters for a SOM. It turns out they are pretty data dependent and most people just do multiple runs until they get results they are happy with.

As I was reading Bishop I ran into his version of the SOM which he calls Generative Topographic Mapping which Bishop claims is more principled and mathematically grounded than SOMs. The GTM, which I really haven’t heard about at all, seems like a fun thing to study, but as a ex-neurophysiologist SOMs are definitely more intuitive to understand.

Code (and link to data source) follows:

Bayesian networks and conditional independence

Bayesian networks are directed graphs that represent probabilistic relationships between variables. I’m using the super-excellent book “Pattern Recognition and Machine Learning” by C. Bishop as my reference to learn about bayesian networks and conditional independence, and I’d like to share some intuitions about these.

First, let’s motivate our study with a common home situation related to kids. We start with the question “Do we have to gas up the car?” (As you know, everything related to kids is binary. There is no room for gray areas). Well, this depends upon whether we need to do a grocery trip, or not. That, in turn depends upon whether the food in the fridge has gone bad and whether we need dino nuggets. Why would the food go bad? Well that depends on whether the electricity went out, and whether the darn kids broke the fridge again. The kids never broke your fridge? Lucky you. Whether we need dino nuggets depends on whether we’re having a party. The probability that the kids broke the fridge depends on whether there were a horde of them or not (i.e. a party).

Whoa, this paragraph looks complicated, like a GRE word problem. Let’s represent this as a directed graph:

Screen Shot 2017-04-13 at 7.57.02 AM

Much better!

All the variables here are binary and we can put actual numbers to this network as follows:

Screen Shot 2017-04-13 at 8.14.09 AM

Explaining some of the numbers, the entry for the (P)arty node indicates that there is a probability of 0.1 we’re going to have a party, and this is uninfluenced by anything else. Looking at the entry for the (G)as up node, we see that whether we gas up depends on the states of F and D. For example, if both F and D are false, there is a 0.1 probability of needing to gas up, and if F is true and D is false, there is a 0.6 probability of needing to gas up, and so on.

Intuitively, for a pair of nodes A and B, there is a linkage if the probability of the output depends on the state of the input(s). So, in the truth tables above I’ve made every row different. If, for example, the probability that B would take the state “T” is 0.2 regardless of whether A was “T” or “F” then we can intuit that A doesn’t influence B and there is no line from A to B. If, on the other hand, B takes the state “T” with p=0.2 if A is “T” and with p=0.8 if A is “F” then we can see that the probability B takes the state “T” depends on A.

Before we start to look at our kid’s party network, I’d like to examine three basic cases Bishop describes in his book, and try an develop intuitions for those, all involving just three nodes. For each of these cases I’m going to run some simulations and print results using the code given in this gist in case you want to redo the experiments for your satisfaction.

The only thing to know going forward is that I’ll be using the phi coefficient to quantify the degree of correlation between pairs of nodes. phi is a little more tricky to interpret than pearson’s R, but in general a phi of 0 indicates no correlation, and larger values indicate more correlation.

The first case is the tail-to-head configuration:Screen Shot 2017-04-13 at 4.56.51 PM.png

Say A=”T” with p=0.5 (i.e. a fair coin) and C=”T” p=0.7 if A=”T” and p=0.3 otherwise, and B=”T” p=0.7 if C=”T” and p=0.3 otherwise. Here “B” is related to “A” through an intermediary node “C”. Our intuition tells us that in general, A influences C, C influences B and so A has some influence on B. A and B are correlated (or dependent).

Now, here is the interesting thing: suppose we “observe C”, which effectively means, we do a bunch of experimental runs to get the states of A, C and B according to the probability tables and then we pick just those experiments where C = “T” or C = “F”. What happens to the correlation of A and B?

If we take just the experiments where C=”T”, while we do have a bias in A (because C comes up “T” a lot more times when A is “T”) and a bias in B (because B comes up “T” a lot more times when C is “T”) the two are actually not related anymore! This is because B is coming up “T” with a fixed 0.7 probability REGARDLESS OF WHAT A was. Yes, we have a lot more A=”T”, but that happens independently of what B does.

That’s kind of cool to think about. Bishop, of course, has algebra to prove this, but I find intuition to be more fun.

As a verification, running this configuration 100000 times I find the phi between A & B = 0.16379837272156458 while with C observed to be “T” it is -0.00037 and “F” it is 0.0024. Observing C decorrelates A and B. MATH WORKS!

As an aside, if I setup the network such that the bias of C does not change with the input from A, I get phi of A & B = -0.00138 and phi of A & B with C observed to be “T” = -0.00622 and “F” =  0.001665 confirming that there is never any correlation in the first place in such a “flat” network.

The second case bishop considers is the tail-to-tail configuration:

Screen Shot 2017-04-13 at 6.33.31 PM

Here we have a good intuition that C influences A and B, causing A and B to be correlated. If C is “T” it causes A to be “T” with pA(T)  and B to be “T” with pB(T) and when C is “F” A is “T” with pB(F) and B is “T” with pB(F). These probabilities switch to track C as it changes, resulting in the linkage.

What happens if we observe C? Say C is “T”. Now this fixes the probability of “T” for both A and B. They maybe different, but they are fixed, and the values A and B take are now independent!

The third case is very cool, the head-to-head configuration:

Screen Shot 2017-04-13 at 7.22.51 PM.png

We easily see that A and B are independent. What happens if we observe C? Let’s consider a concrete example with a probability table

A B  p(T) for C
F F  0.1
F T  0.7
T F  0.3
T T  0.9

Say we run 100 trials and A and B come up “T” or “F” equally likely (p=0.5). We expect each AB pattern will occur equally often (25 times) and then the expected number of C=T states is

A B  E(C=T)
F F  0.1 * 25 =  2.5
F T  0.7 * 25 = 17.5
T F  0.3 * 25 =  7.5
T T  0.9 * 25 = 22.5

So we have an expected 50 trials where C=T. Of that subset of trials, the fraction of trials that each pattern comes up is:

A B  p(AB|C=T)
F F   2.5 / 50 = 0.05
F T  17.5 / 50 = 0.35
T F   7.5 / 50 = 0.15
T T  22.5 / 50 = 0.45

Ah! You say. Look, when A and B are independent, each of those patterns come up equally often, but here, after we observe C, there is clearly an imbalance! How sneaky! Basically, because some patterns of AB are more likely to result in C=T this sneaks into the statistics once we pick a particular C (Those hoity, toity statisticians call this “conditioning on C”). A and B are now dependent!!

I have to say, I got pretty excited by this. Perhaps I’m odd.

But wait! There’s more! Say C has a descendent node. Now observing a descendent node actually “biases” the ancestor node – so in a way you are partially observing the ancestor node and this can also cause A and B to become dependent!

Now, I was going to then show you some experiments I did with a simulation of the Kid’s party network that I started with, and show you all these three conditions, and a funky one where there are two paths linking a pair of nodes (P and G) and how observing the middle node reduces their correlation, but not all the way, cause of the secondary path, but I think I’ll leave you with the simulation code so you can play with it yourself.

Code for this little experiment is available as a github gist.

CIGAR strings

When I first joined Seven Bridges my boss would be describing things and he would say “Do this with the CIGAR” or “compute that from the CIGAR”. Reluctant to be found out so early I nodded sagely and made a furtive note in my notebook to check out what the heck a CIGAR was.

You are most likely to run into this word – and it is in all caps, for reasons that may become clear later – if you operate closer to the nasty end of any work that requires genomic sequencing. One of the more compute intensive things we have to do to raw genomic data before we can get with the real sciencey stuff is to figure out how a sequence obtained in an experiment relates to some reference data.

A lot of life is tied together and it shows in the DNA. There are similarities not only between the DNA of humans, between the DNA of humans and chips, but also between mice and men, but the closest match is by far between human beings of all “races” and ethnicities, despite what some ignorant folk will try to tell you.

At the tiniest level, at very early stages of genomic sequence data analysis, we take a string of data, say


and compare it against some reference data, say


We notice that we can align the data against the reference by changing one letter (marked by *), assuming some letters have been deleted (– in the data) and some letters have been inserted (- in the ref)

       |||*|||  |||| ||||

In general the vast majority of these alignments of data against the reference can be described as a series of exact matches interspersed by single base mismatches, insertions and deletions.

Now, what does CIGAR stand for? It stands for Compact Idiosyncratic Gapped Alignment Record. Yes, it does sound like a backronym doesn’t it? But the “gapped alignment part” has probably given you a good clue. The major operation above was in inserting gaps, either in the data or the reference, in order to get the two strings to align against each other.

A CIGAR string is simply the set of operations we did encoded succinctly. So, in the case above, the CIGAR is



In the CIGAR we usually ignore single base mismatches and focus on the insertions and deletions (though the “extended CIGAR” is much more detailed and includes information about mismatches) and so we convert the gapless bits to “M”, the number indicating how many letters (bases) form a stretch of gapless alignment. This is interspersed with “D”s and “I”s to indicate deleted subsequences and inserted subsequences.

And now you know.

Development environment

While it can be an avenue for procrastination, searching for a proper development environment often pays off with greatly improved productivity in the long term. I had some finicky requirements. I wanted to maintain my own CMakeLists file so that the project was IDE/editor/platform independent. I wanted to be able to just add files to the project from anywhere – from VI, by OS copy, by git pull – (I set wildcards in the CMakeLists.txt file) – and definitely not have to fiddle in the IDE to add files. I wanted to edit markdown files which I was using for documentation, and placing in a separate docs folder. I wanted to see a decent preview of the markdown occassionally. Ideally, my environment would simply track my directory organization, including non C++ files. However, I also wanted intelligent code completion, including for my own code.

At work, where a ton of my current stuff is in Python, I happily use Pycharm which has been a very worthwhile investment. So CLion was a natural candidate to start with. I didn’t get very far, because I balked at the license terms. It felt a little annoying to have to apply for an open source license and then renew every year and so on. When I want to play with red-tape I talk to the government.

I completely understand needing to charge people for your software. It’s a necessity for maintenance and continued development. But you can not get money from people who have none. To me the rational model is to charge professional entities for the software, to make it free to use for non-professional use and just not bother with the small fry. Basically, have a smallish legal department that goes after a big player if they violate. For companies and individual professionals a software license is the least of their costs (and probably the item with the highest ROI), and they will gladly pay to have some one to go to for support. Chasing down individual hobbyists just doesn’t make sense. I’m sorry, where did that soapbox come from? I must have stepped on it by accident. Moving on to the next editor/IDE …

I tried XCode next. It was a medium-ish install, and I think I needed an iTunes account. I was encouraged to use it because CMake can generate XCode projects and many coders working on Macs swore by it. I have used it infrequently – mostly when I was editing a largish Objective C project during my postdoc. I found it fine to use and debug with. It was fine, but I disliked how CMake organized the XCode project virtual directories, and I disliked how I had to manually add non-source and new files. And memories of the over complicated build system started to come back to me.

I then found something called CodeLite. It was confusing because I spent some time trying to figure out if it was related to Code::Blocks. CodeLite looked kind of promising. There was a nice tutorial on how to integrate with CMake based projects. But it looked cumbersome, and I had to add files to the project through the interface. By now, I was willing to endure this but I couldn’t get over the fact that there was no preview for Markdown. I liked that they limited the number of plugins, but I really wanted markdown preview. I told you I was finicky.

Then, quite by accident I started to poke around Visual Studio Code and downloaded it to try it out. In brief, it seems to satisfy all my finicky requirements. This choice rather surprised me, because it was just not on my radar. I also realize this does nothing to advance my attempt to get into the cool kids programming club where people use vim or emacs, but that was a lost cause anyway. (I now recall that I initially avoided VSC because a colleague spoke out very strongly against it a few months ago. Goes to show, you need to try things out for yourself)