Galaxy cluster classification with convolutional neural networks

Blog

During my time at ESA, I mentored a Master’s student, Matej Kosiba from Masaryk University, and we recently published a paper together on using convolutional neural networks to detect clusters of galaxies (https://arxiv.org/abs/2006.05998).

You see, the detection of galaxy clusters is extremely out-dated. Typically a wavelet filtering algorithm is applied on X-ray images to pull out candidate clusters, and then a team of “experts” will examine each candidate, often combined with access to the corresponding optical data on the sky, to verify if the object is actually a cluster. More often than not, the candidates will be contaminated with point sources (stars, AGN), nearby galaxies and artefacts within the data. So when you have perhaps 1000 candidate objects, with 2 experts looking at each candidate it still is doable. But when X-ray observatory e-ROSITA comes online ~100,000 the traditional methods of detection will be inefficient.

The research therefore turned to machine learning, which is more adapted to dealing with huge numbers of tasks quickly and in an objective manner. In particular, convolutional neural networks which were developed specifically to apply to image data. Convolutional neural networks (which I’ll leave the details for another post) learn the hierarchical structure of features within images of the same class, and are relatively insensitive to small offsets or changes in the images.

To train the network we require image data and the corresponding class labels. The image data are X-ray images stacked onto the Optical images. For the class labels, we take 2 approaches. The first approach used the power of citizen science to crowd classify large samples of the data by large numbers of volunteers. To do this the authors created a Zooniverse project attracting thousands of volunteers. The second approach took labels classified by 2 experts. We compared them using a spectroscopically confirmed cross calibration sample.

This is the money plot, it’s called a ROC curve (receiver operating characteristic curve) and shows the true positive rate vs false positive rate for different thresholds of classification. The idea is that is the lines lie on the diagonal, then the networks are performing as well as chance. Ideally you want the lines to draw up to the top left hand corner. In this plot we see the various networks tried: CN-E (custom network trained on expert labels), MN-E (Mobilenet trained on expert labels), CN-Z (custom network trained on Zooniverse labels).

The main result of the paper is that CNNs could achieve over 90% accuracy on classifying clusters of galaxies, on multi-wavelength data (X-ray plus optical) and at a tiny fraction of the time it takes for an expert to do the same. Additionally we found that the X-ray data were better for classification of clusters, however this may be due to the limited depth of the optical data we had at hand.

Will there be a last man on the moon?

Blog

I was born in 1990. Mankind first stepped on the moon on July 20th 1969. We did it again in November of the same year, again in February and July of 1971, and in April and December of the following year.

In total there have been 12 men to have set foot on another celestial body. I have never known a day where I have not been sharing the Earth with a lunar astronaut. But today 8 of those 12 heroes have passed away. The 4 remaining, Buzz Aldrin, David Scott, Charles Duke and Harrison Schmitt are aged 89, 87, 83 and 84 respectively.

We haven’t been back to the moon since 1972. NASA have a target to get another human there with the Artemis mission by 2024, but is this already too late?

So my question to you is Is it likely that we will see another human on the moon before all our lunar astronauts go extinct? or will it soon be the day of the last man on the moon?

We can use statistics to help us infer the odds.

Firstly let’s take mortality predictions from the US social security actuarial life table of 2016.

data = read.table('life_exp.txt',header=TRUE)

You can download the data here.

Next we calculate the probability that an astronaut will die within n years, given their age. This is given by,

P(death in N years | current age) =
P(death in year N | age in year N) $\times$ P(alive in year N-1 | current age) + P(death in year N-1 | current age)

pdeadn = function(startyr = 2019, endyr = 2020, birthyr){
  age = startyr - birthyr
  nyrs = endyr-startyr+1 #add one because you count the first year as 2016
  cat('number of years:', nyrs, '\n')
  
  pdead=c()
  pdead[1] = data$pdeath_male[which(data$age == age)]
  for(i in 2:nyrs){
    age = age+1
    pdead[i] = (1.0 - pdead[i - 1])*data$pdeath_male[which(data$age == age)] + pdead[i-1]
  }
  
  cat('final age:', age,'\n')
  return(pdead)
}

We apply this to our 4 alive astronauts, who were born in 1930, 1932, 1936 and 1935, to get the probability of their deaths up until 2040.

startyr = 2019 #start at current year since we know they are alive
endyr   = 2040
birthyr = c(1930, 1932, 1936, 1935)
pdead   = sapply(birthyr, function(x) pdeadn(startyr=startyr, endyr=endyr, birthyr=x)) #probability of death

Assuming that the deaths of any of the astronauts are independent events, the probability that all the lunar astronauts are dead is given by the multiplication rule,

P(all astronauts dead) = P(Aldrin dead) $\times$ P(Scott dead) $\times$ P(Duke dead) $\times$ P(Schmitt dead)

Applying this we can obtain the following plot of the probability that all astronauts are dead as a function of time,

yrs      = seq(2019, 2040)
pdeadall = apply(pdead, 1, prod)
plot(yrs, pdeadall, ty = 'l', tck = 0.02, xlab = 'year', ylab = 'P(all lunar astronauts dead)', xlim = c(2019, 2040))
axis(side = 3, tck = 0.02, labels = FALSE)
axis(side = 4, tck = 0.02, labels = FALSE)
abline(v = 2024, lty = 'dashed', col = 'orange', lwd = 3)

The orange line shows the goal launch date of NASA’s Artemis program to get the first woman and next man on the moon. From this we can see that there is a 12% probability that all the Apollo Astronauts will not see the next landing on the moon. However, experience shows that space launches are never delivered on time. The following plot shows the predicted launch date of NASA’s Hubble Space Telescope as a function of time.

Hubble launched in 1990, 2.3 times the original scheduled launch time. If the Artemis program faces similar delays, we won’t see a moon landing until 2031, by which case the probability that we will go back on progress, and that I will see a day where no man has stepped on the moon will have increased to 78%, and by 2034 (the set goal of the Chinese to land on the moon) almost certainly (93%) this will be the case 😦

Have you got what it takes to be the next Cosmonaut?

Blog, Glossary

Russian space agency ROSCOSMOS this week announced the opening of the next selection programme of cosmonauts (Russian astronaut) see: https://twitter.com/roscosmos/status/1135521818301870083

Don’t read Russian? No problem, I have you sorted:

You’re welcome! Applications are accepted from Russian citizens between now and 1 June 2020, and at the end of it there will be 4-6 newly selected Cosmonauts! Could that be you?

Interestingly they require a minimum footsize of 29.5cm… I wonder what that could be for?

Whilst here you know see the entry requirements for a Russian Astronaut, in my upcoming youtube video, I’ll be talking about what it takes it be an astronaut in general, so if you don’t follow my youtube channel yet, you really should!

https://www.youtube.com/channel/UClO6TuRE6XLzbMBmQ_KY38A

Tips on public speaking

Blog

Throughout my life I must’ve given hundreds of talks now, and almost always I’m asked before I talk whether or not I’m nervous. Who doesn’t feel anxiety when speaking in front of a crowd? Here’s a collection of tips I compiled to help prepare for public speaking:

Make a list of the key points you want to make and construct it in chronological order. Each slide should portray 1 single message.
Practice, practice, practice – all-day-all-night, in front of people if possible and ask for feedback, or even for an audience of your teddy bears
Prepare 1 slide per minute. When you’ve mastered that, try the PechaKucha method – 20 seconds per slide. Humans have very limited attention spans so the more movement you have going on the better.
Engage the audience – ask rhetorical questions, or even ask interactive questions and set the scene e.g. ‘Imagine you are in deep space…’
Be visual. Use lots of images and try to limit the number of equations. Is there a better way to explain what you’re trying to get across without all the math?
Scan across the room regularly, try to make eye contact, use hand gestures and make sure to smile. A study on TED talks (see here), found that there was a correlation between the popularity of a video and the number of hand gestures the speaker made.

That’s all I can think of for now. Let me know if I’ve missed any!

Bye for now xoxo

New youtube channel

Blog

Trying to blog even semi-regularly turns out to be difficult! I didn’t even notice that its been 9 months since my last post! But last month I have been busy setting up my new youtube channel ‘space mog’ as well as other accompanying social media sites (instagram, facebook). Hopefully this will help steer my career back onto the public communication route which has been very lacking since my move to Spain. It turns out that if you don’t speak the language, it’s pretty difficult to teach – who knew?.

Anyway my latest video can be found at the end of this post. The idea is to bring educational science material in short segments to the world wide web. If there’s any topic you’re burning for me to talk about please let me know! Bye for now xoxo

Surrogate models

Blog

At StanCon 2018 Helsinki, a couple of us gathered together for this thing Breck organised – birds of a feather, to discuss topics of interest with each other. I think there were about 8 of us interested in surrogate models and these are some of the notes from that meeting.

So what is a surrogate model?

Let’s say you have a function

$x = F(\alpha, \beta, ...)$

you could approximate the function $F$ with a surrogate model

$x \approx \hat{F}(\alpha, \beta, ...)$

Why would you want to do this?

Some ideas we came up with are:

When you have a complex function, which is difficult to code up but you can generate many simulations from it.
You have a very expensive, slow function, and you need in service modelling so need results fast.
You have no idea what the function is, you might have some data for it, but this is likely to be sparse

How do you go about doing this?

The most common methods involve some sort of interpolation:

splines, polynomial interpolations
neural networks
gaussian processes

In cases where you have some sparse data and an unknown/complex function we decided that using either a probability of detection (POD) model, principle component analysis (PCA) or singular value decomposition (SVD) would be good idea.

Some food for thought

If in our model we have something like:

$y \sim N(F(a,b,...),\sigma)$

and we substitute in a surrogate model

$y \sim N(\hat{F}(a,b,...),\sigma)$

It seems like $\sigma$ is just another additional parameter and so maybe we could consider ingesting it directly into our surrogate model.

Where do surrogate models fit into statistical models?
How can we make surrogate models happen?
How can this generalise to different problems?
How can we diagnose it?

Selection function in STAN

Blog

I’m currently working on a problem where I need to sample from a very specific custom pdf. It has taken me a lot of time to get this pdf written down but now that I finally have it, I’m just missing one key ingredient… the selection function! This is when you have some underlying data (lets say some images of stars), but the observation of that data has been truncated (for example limitations of the telescope mean we can only see stars brighter than 25 mag) and then scattered by some noise. The selection function is the truncation, it is how your sample is selected from the underlying population. Generally the selection is unknown so the best way to tackle it is to fit for it.

It took me a really long time (2 days!) to sit down and figure out how to do a selection function in STAN. The problem was that I couldn’t really find any working examples on the ‘interwebs‘ but turns out that its actually really easy and really fast.

Here I demonstrate with some easy examples.

Case 1: Simple Gaussian.

First we need to generate some toy data.

nx = 1000 #number of data points in population
mu = 7 
sig = 1.5
x = rnorm(nx, mu, sig) #underlying population values

cut = 6
xcut = x[which(x > cut)] #sample values after selection
n = length(xcut)

dx = 0.2 #uncertainty on x
xobs = xcut + rnorm(n, 0,dx ) #observed values

#make kernel density plot of the distributions
plot(density(x), ylim=c(0,0.4), main='', col='red', xlab='x', ylab='unnormalised density')
abline(v=cut, col=rgb(0,0,0,0.3))
lines(density(xcut), lty='dotted', col='red')
lines(density(xobs), lty='dashed')
legend('topleft', legend=c('population','sample', 'observed sample'), col=c('red', 'red', 'black'), lty=c('solid','dotted','dashed'))

The STAN model should look something like this:

mymodel <-"
data {
 int n;
 real xobs[n]; //observed x
 real dx;
}

transformed data {
 // real xcut = 6.0; //use for a fixed selection... generally selection is unknown!
}

parameters {
 real xcut; //selection value
 real mu;
 real sigma;
 real xtrue[n]; //true sample x
}

model {
 xcut ~ normal(6,1);
 for(i in 1:n){
  /* we truncate a lower bound at xcut using T[lower,upper]. This already includes normalisation. Also note with truncation we need to do sampling in a for loop! */
  xtrue[i] ~ normal(mu, sigma) T[xcut,];
 }
 xobs ~ normal(xtrue, dx);
}

"

We then run the code, with sensible initialisation. Pro-tip: make sure to initiate xtrue above the selection cut!

 nchains = 3 #number of chains

#initialisation
 init1 <- lapply(1:nchains, function(x) list(mu=rnorm(1, 7,1), sigma=rnorm(1,1.5,0.2), xtrue=rnorm(n, 8,0.1), xcut=6.0))

fit = stan(model_code = mymodel, data=list(n=n, xobs=xobs, dx=dx),
 init=init1, chains=nchains, iter=2000,
 control=list(adapt_delta=0.8))

You should get something that looks like this:

print(fit, pars=c('xcut','mu','sigma') )

Inference for Stan model: 4eebe5b93a44437492c76c544c69646a. 3 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=3000.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat xcut 6.01 0.00 0.04 5.93 5.98 6.01 6.04 6.08 110 1.03 mu 6.83 0.01 0.22 6.36 6.71 6.86 6.99 7.17 314 1.01 sigma 1.61 0.00 0.11 1.42 1.53 1.60 1.67 1.85 488 1.01

Samples were drawn using NUTS(diag_e) at Wed Jan 24 10:02:33 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).

pairs(fit, pars=c('mu','sigma'))

We can also make some posterior predictive checks, here we show some data generated from a gaussian of the fitted mean and standard deviation (black), as you can see they agree really well with the population.

params=extract(fit)
xfit=colMeans(params$xtrue)
mufit = mean(params$mu)
sigfit = mean(params$sigma)

plot(density(rnorm(n, mufit, sigfit)), col=rgb(0,0,0,0.5), lty='dotted', main='', xlab='x', ylim=c(0,0.4))
lines(density(rnorm(n, mufit, sigfit)), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(rnorm(n, mufit, sigfit)), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(rnorm(n, mufit, sigfit)), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(x), col=rgb(1,0,0,0.5))
lines(density(xobs),col=rgb(1,0,0,0.5), lty='dashed')
legend('topleft', legend=c('true', 'observed', 'predicted'), col=c('red', 'red', 'black'), lty=c('solid','dashed','dotted') )

Case 2: Custom distribution

This is where things get a bit more tricky, but still isn’t a challenge for STAN. Before we jump into the coding, we should discuss the boring part (the math!). When we sample from a truncated pdf what we really are doing is sampling from,

$x \sim P_{[a,b]} (x) = \frac{P(x)}{\int_a^b P(u) du}$ .

A lower selection limit then looks like,

$x \sim P_{[xcut,\infty]} (x) = \frac{P(x)}{\int_{xcut}^\infty P(u) du}$ .

Stan can’t do integrals numerically but it can be done by solving the ODE. Unfortunately the denominator in this is an improper integral and STAN doesn’t deal well with improper integrals so we can re-parameterise the integral in the following way:

$\int_{xcut}^\infty P(x) dx = \int_0^1 P\left(xcut + \frac{x}{(1-x)} \right)/(1-x)^2 dx$ .

Now the next problem is that this integral is numerically unstable. At x=1, the integrand is infinite. To solve this we can just do a set the upper limit to a number that is a little bit smaller than 1, e.g. 0.9999.

Okay so now with the theory out of the way, we can move onto the code. We first generate again some toy data from a custom distribution:

n = 2000 #population size
x = c(rnorm(n/2, 4, 1.5), rnorm(n/2, 7,1)) #x is drawn from a mixture model
cut = 3
xsam = x[which(x > cut)]
ns = length(xsam) #sample size after selection

dx = 0.5 #measurement uncertainty
xobs = xsam + rnorm(ns, 0, dx) #observed sample

plot(density(x), ylim=c(0,0.3), main='', xlab='x', ylab='unnormalised density', col='red')
lines(density(xsam), col='red', lty='dotted')
lines(density(xobs), lty='dashed')
abline(v=cut, col=rgb(0,0,0,0.3))
legend('topleft', legend=c('population','sample', 'observed sample'), col=c('red', 'red', 'black'), lty=c('solid','dotted','dashed'))

We define the STAN model as follows:

custmodel<-"
 functions {
 real custom(real y, real mu1, real mu2, real sigma1, real sigma2){
  //custom pdf
  return 0.5*exp(- square(mu1- y) / (2 * sigma1^2) )/(sigma1*sqrt(2*pi())) +
 0.5*exp(- square(mu2 - y) / (2 * sigma2^2) )/(sigma2*sqrt(2*pi())) ;
 }

real[] N_integrand(real y, real[] state, real[] params, real[] x_r, int[] x_i) {
 //ode to solve
 real mu1 = params[1];
 real mu2 = params[2];
 real sigma1 = params[3];
 real sigma2 = params[4];
 real xcut = params[5];
 real dxdy[1];
 real ynew = xcut + y/(1-y);

 dxdy[1] = custom(ynew, mu1, mu2, sigma1, sigma2)/square(1-y);

 return dxdy;
 }
}

data {
 int n;
 real xobs[n];
 real dx;
}

parameters{
 real xcut;
 ordered[2] means; //ordered because mixture models have identifiability issues
 real sigs[2];
 real xtrue[n];
}

model{
 real norm;
 real theta[5]={means[1], means[2], sigs[1], sigs[2], xcut};

 xcut ~ normal(4.0,1.0);

/* We have to re-parameterise the improper integral int_xcut^inf p(x) dx to int_0^1 p(xcut + x/(1-x))/(1-x)^2 dx */
 norm = integrate_ode_rk45(N_integrand, rep_array(0.0,1), 0.0, rep_array(0.9999,1),
 theta, rep_array(0.0,0),rep_array(0,0))[1,1];

 for(i in 1:n){
  target += log(custom(xtrue[i], means[1], means[2], sigs[1],sigs[2]));
  if(xtrue[i] < xcut){
   target+=negative_infinity();
  }else{
   target+=-log(norm); //normalise pdf
  }
 }
 xobs ~ normal(xtrue, dx);

}

"

Note that the custom distribution is a gaussian mixture model. Recall in a previous post (Multivariate Gaussian Mixture Model done properly) I discussed the problems with mixture models and their identifiability issues. For this example I only run 1 very long chain for the demonstration since the Bayesian Fraction of Missing Information, I needed a lot more iterations and warm-up to achieve a good effective sample size. Here’s the results:

nchains = 1
init1 <- lapply(1:nchains, function(x) list(means=c(4.0,7.0), sigs=c(1,1), xtrue=rep(6.0,ns), xcut=3.0))

fit = stan(model_code = custmodel, data=list(n=ns, x=xobs, dx=dx), init=init1, chains=nchains, iter=5000, control=list(adapt_delta=0.8))

print(fit, pars=c('means','sigs','xcut'))

Inference for Stan model: 0ff67e36d860e76473b011bccd66c477. 1 chains, each with iter=5000; warmup=2500; thin=1; post-warmup draws per chain=2500, total post-warmup draws=2500.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat means[1] 3.69 0.04 0.42 2.82 3.41 3.71 4.00 4.43 112 1 means[2] 6.88 0.01 0.11 6.65 6.81 6.89 6.96 7.08 156 1 sigs[1] 1.42 0.00 0.13 1.20 1.33 1.41 1.49 1.71 684 1 sigs[2] 1.04 0.00 0.07 0.93 1.00 1.04 1.09 1.18 246 1 xcut 3.06 0.01 0.09 2.86 3.02 3.07 3.13 3.21 61 1

Samples were drawn using NUTS(diag_e) at Thu Jan 25 11:08:37 2018. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).

pairs(fit, pars=c('means','sigs', 'xcut'))

# posterior predictive checks
params=extract(fit)
means = colMeans(params$means)
sigs = colMeans(params$sigs)

plot(density(xobs), main='', xlab='x', ylab='unnormalised density', ylim=c(0,0.4), lty='dashed', col=rgb(1,0,0,0.5))
lines(density(c(rnorm(n/2, means[1], sigs[1]),rnorm(n/2, means[2], sigs[2]))), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(c(rnorm(n/2, means[1], sigs[1]),rnorm(n/2, means[2], sigs[2]))), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(c(rnorm(n/2, means[1], sigs[1]),rnorm(n/2, means[2], sigs[2]))), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(c(rnorm(n/2, means[1], sigs[1]),rnorm(n/2, means[2], sigs[2]))), col=rgb(0,0,0,0.5), lty='dotted')
lines(density(x), col=rgb(1,0,0,0.5))
abline(v=cut, lty='dashed',col='red')

legend('topleft', legend=c('true', 'observed', 'predicted'), col=c('red', 'red', 'black'), lty=c('solid','dashed','dotted') )

Overall good agreement. So yeh. Selection function in STAN? Easy peasy!. Also see https://github.com/farr/SelectionExample and https://github.com/farr/GWDataAnalysisSummerSchool/blob/master/lecture3/lecture3.ipynb for some more examples!

Problems with binning data

Blog

I learned early on in my PhD that binning is not a good idea. The loss of information from binning will lead to biased results. Here I demonstrate this with a simple linear regression toy model.

#generate toy data
x = runif(nc, 13,15)
y = 22+1.5*x+rnorm(nc, 0,0.5)

#bin in y variable
nbin = 10
ybins = seq(min(y), max(y), length.out = nbin)
ybinned = sapply(1:(nbin-1), function(z) mean(y[which(y> ybins[z] & y <= ybins[z+1])]))
ybinnedx = sapply(1:(nbin-1), function(z) mean(x[which(y> ybins[z] & y <= ybins[z+1])]))
 
 
#bin in x variable
xbins = seq(min(x), max(x), length.out = nbin)
xbinned = sapply(1:(nbin-1), function(z) mean(x[which(x> xbins[z] & x <= xbins[z+1])]))
xbinnedy = sapply(1:(nbin-1), function(z) mean(y[which(x> xbins[z] & x <= xbins[z+1])]))

plot(x,y, pch=20, cex=0.1)
abline(a=22, b=1.5)
 
points(ybinnedx, ybinned, pch=20, col='red')
points(xbinned, xbinnedy, pch=20, col='blue')

As seen in the plot, if we bin based on the y variable (red), we will bias our estimation of the slope and intercept whereas binning in the x variable (blue) does not incur this problem. This is because the intrinsic scatter is applied only to the y variable.

The problem intensifies when observational uncertainties are included in the x and y measurements. By modifying the following lines in the code:

#generate toy data
x = runif(nc, 13,15)
y = 22+1.5*x+rnorm(nc, 0,0.5)

to:

#generate toy data
xtrue = runif(nc, 13,15)
ytrue = 22+1.5*xtrue+rnorm(nc, 0,0.5)

x = xtrue + rnorm(nc, 0, 0.5)
y = ytrue + rnorm(nc, 0, 0.5)

we obtain a plot such as the following:

Now a bias is seen binning in either x or y. Whilst this may have been exaggerated to large uncertainties for clarity, it is usually more common in Astronomy to bin when data is noisy! A good project for the keen and eager then should be to investigate a way to bin such that the measurement uncertainties and scatter are taken into account so to avoid such biases, but unfortunately I already have too many projects to work on…

#DarkMatterDay

Blog

31.10.2017

The first ever dark matter day… Why the particle physicists decided to it should coincide with halloween I have no clue. Clearly the trending tags will be dominated by the later occasion right?

Anyway I made a video about dark matter. I hope you enjoy it!

My city Coventry

Blog

Okay so a few weeks ago at work a colleague asked me where I was from in the UK, and I replied
“Oh, it’s a small little town, you’ve probably never heard of it, Coventry. It’s close to Birmingham”.
This is a natural response for me given that I lived in the US for a year and the only city anyone had heard of is London.

Actually the worse conversation I ever had went something along the lines of:

Her: “You’re from England, wow you’re english is so good!”
Me *confused*: “Erm, yeh because English people tend to speak english… ”
Her: “Oh really, I thought you spoke french there”
Me: *facepalm*

Anyway. So it turned out that my colleague had indeed heard of coventry since there are at least 2 other ESA employees working here in Spain who are from Coventry.

Statistically wise this is a super amazing opportunity for me to calculate the probability of that actually happening!

So lets say there are ~400 employees at ESA, Spain.

Since employment is swayed by the contribution of ESA member states and the UK’s slice is ~8%, we expect there to be:

$0.08 \times 400 = 32$ employees at ESA from the UK.

The probability of selecting someone from coventry out of the whole of the UK is given by the ratio of their populations:

$\frac{\rm Coventry's\ population}{\rm UK\ population} = \frac{3.2\times 10^5}{6.6\times 10^7} = 0.005$

So we would expect

$0.005 \times 32 = 0.16$ employees from Coventry working at ESA spain!

Since we have at least 3 of them, the likelihood of us all being here is much smaller! Note also that we all come from all over Coventry and went to completely different schools, so its unlikely that there could be some sort of bias. None the less, good job Coventry on producing some great scientists!

Dr Maggie Lieu

Astrophysicist

Blog

Galaxy cluster classification with convolutional neural networks

Will there be a last man on the moon?

Have you got what it takes to be the next Cosmonaut?

Tips on public speaking

New youtube channel

Surrogate models

So what is a surrogate model?

Why would you want to do this?

How do you go about doing this?

Some food for thought

Selection function in STAN

Case 1: Simple Gaussian.

Case 2: Custom distribution

Problems with binning data

#DarkMatterDay

My city Coventry