Distance Ladder I - Resolved Galaxies¶
To create a distance ladder for our universe, we need to calculate the distances to nearby galaxies that have resolved stars. We can do this by utilising two equally valid and independent methods: the period-luminosity relationships we found earlier, and 'Main Sequence' fitting of HR diagrams.
Period-Luminosity Distance Estimation¶
As in the previous workbooks, I'll start by importing the names of resolved galaxies we've identified.
import os 
datapath = 'Sim Data (Clusters; 800, Seed; 2639)' # all of the data is within a folder in this .ipynb file's directory
GalaxyNames = []
for clusterFile in os.listdir(datapath + '/Star Clusters'):
    GalaxyNames.append(clusterFile[:-4]) # this gets the name of the cluster, without the `.txt' file extension
To begin with, I'll look at the period-luminosity relationship as a means of distance estimation. That is, for all of the variable stars in a galaxy, I'll estimate their intrinsic luminosity and compare that against their observed flux (via the inverse square law) to calculate distance.
The below code block is quite large, so I'll explain the rationale before we jump into the code. For a given galaxy, we need to identify which stars are variable, and then calculate their variability periods via a Lomb-Scargle periodogram. We can then put those periods into the period-luminosity relationships we found in an earlier workbook to yield the approx. intrinsic luminosity of each star. We then compare that with that stars observed flux and obtain the distance to that star as a result. Finally, we can use a statistical trick to ignore outlying data and take the average of the remaining (valid) data points to obtain an estimate for a galaxys distance.
import pandas as pd
import numpy as np
from astropy.timeseries import LombScargle
freqs = np.linspace(1/120, 0.45, 1000)
distance_data = pd.DataFrame({'Name': GalaxyNames})
PL_dists = np.zeros(len(GalaxyNames))
PL_unc = np.zeros(len(GalaxyNames))
for num, name in enumerate(GalaxyNames): # iterate over the identified galaxies
    galaxdata = pd.read_csv(datapath + f'/Star Clusters/{name}.txt', delimiter=' ')
    # now to isolate the variable stars
    variables = galaxdata['Variable?']
    variableindexes = [i for i, x in enumerate(variables) if x == 1]
    variablenames = galaxdata['Name'][variableindexes].to_numpy()
    periods = np.zeros(len(variablenames)) # initialise
    # now calculate the period of each variable star via LS periodogram
    for i, star in enumerate(variablenames):
        photometryData = pd.read_csv(datapath + f"/Variable Star Data/{star}.txt", delimiter=' ')
        
        time, flux = photometryData['Time'], photometryData['NormalisedFlux'] # just extract the columns as variables
        LS = LombScargle(time, flux) # initialize a Lomb-Scargle fitting
        power = LS.power(freqs) # calculate LS power 
        bestfreq = freqs[np.argmax(power)] # which frequency has the highest Lomb-Scargle power?
        pred = LS.model(time, bestfreq) # make a sine wave prediction at the best frequency
        periods[i] = 1 / bestfreq # add each period to the list
        
    intr_lumin = np.zeros(len(periods)) # initialise
    lumin_err = np.zeros(len(periods))
    # time to work out intrinsic luminosities via the PL relations
    for i, period in enumerate(periods): 
        if 20 <= period <= 35:
            intr_lumin[i] = 10**(-0.086 * period + 28.25)
            AB = 0.087 * period * np.sqrt((0.002 / 0.087)**2 + (0.1 / period)**2)
            AB_C = np.sqrt(AB**2 + 0.04**2)
            lumin_err[i] = intr_lumin[i] * (np.log(10) * AB_C)
        if 35 <= period <= 55:
            intr_lumin[i] = 10**(-0.087 * period + 33.72)
            AB = 0.087 * period * np.sqrt((0.002 / 0.087)**2 + (0.1 / period)**2)
            AB_C = np.sqrt(AB**2 + 0.08**2)
            lumin_err[i] = intr_lumin[i] * (np.log(10) * AB_C)
        elif 80 <= period <= 110:
            intr_lumin[i] = 10**(-0.087 * period + 34.35)
            AB = 0.087 * period * np.sqrt((0.001 / 0.087)**2 + (0.1 / period)**2)
            AB_C = np.sqrt(AB**2 + 0.06**2)
            lumin_err[i] = intr_lumin[i] * (np.log(10) * AB_C)
            
    # now to finally compare intr_lumin against the fluxes to obtain distances
    greenFluxes = galaxdata['GreenF'][variableindexes].to_numpy()
    distances = []
    distance_unc = []
    
    if len(intr_lumin) > 10: # we only want to look at galaxies with reasonably high sample size of variables
        for i, lumin in enumerate(intr_lumin):
            if lumin > 0: # this is to avoid noisy data
                distance_m = np.sqrt(lumin / (4 * np.pi * greenFluxes[i]))
                distance_pc = distance_m / (3.086 * 10**16)
                distances.append(distance_pc)
                
                lumin_unc = distance_m**2 * np.sqrt((lumin_err[i] / lumin)**2 + (0.01)**2)
                distance_pc_unc = abs(distance_m * 0.5 * lumin_unc / distance_m**2) / (3.086*10**16)
                distance_unc.append(distance_pc_unc)
        # now we need to remove any (inevitable) outliers from the data
        q1, q3 = np.percentile(distances, [25, 75], interpolation='midpoint')
        IQR = q3 - q1
        # the below checks if the data point is within 2.7 standard deviations of the mean
        upper = q3 + 1.5*IQR
        lower = q1 - 1.5*IQR
        # now we only care about data within 2.7 STDs 
        distances = [dist for dist in distances if lower <= dist <= upper]
        distance_unc = [distance_unc[i] for i, x in enumerate(distances) if lower <= x <= upper]
    PL_dists[num] = np.mean(distances)
    PL_unc[num] = np.mean(distance_unc)
Finally, we can add these distance and uncertainty data as columns in a pandas DataFrame so that we can use it and/or save it later.
distance_data['PL_distance'] = PL_dists
distance_data['PL_unc'] = PL_unc
'Main Sequence Fitting' Distance Estimation¶
Another, completely independent method of distance estimation is by adjusting the apparent colour-magnitude curve of a galaxy so that it matches a known curve. That is, if we plot the colour-magnitude diagram of a galaxy's flux, we can vertically offset it to match the luminosity of a known colour-magnitude diagram. Since flux and luminosity are related by distance (by the inverse square law), we can then use this offset to directly estimate the distance that the galaxy is to us.
From a previous notebook, we found a colour-magnitude (or Hertzprung-Russell) diagram of our local galaxy in terms of stars' luminosity in the green 500nm band. I've calculated and plotted it again below so that we can use this as a benchmark for other galaxies.
import matplotlib.pyplot as plt
allstardata = pd.read_csv(datapath + '/All Star Data.txt', delimiter=' ')    # read the data from the .txt file into a dataframe
parallax = allstardata['Parallax']    # get the parallax of the stars
localindex = [i for i, x in enumerate(parallax) if x > 0.007]
localVflux = np.array(allstardata["GreenF"])[localindex]
localBflux = np.array(allstardata["BlueF"])[localindex]
BV = np.log10(localVflux / localBflux) # B-V colour index 
localVlumin = localVflux * 4 * np.pi * (1 / np.array(allstardata["Parallax"][localindex]) * 3.086 * 10**16)**2
fig = plt.figure(figsize=(10, 8))   # make a figure to plot stuff on
ax = fig.add_subplot(1, 1, 1)    # and add an axis so we can plot stuff on the figure
ax.scatter(BV, localVlumin, s=1.5, c='k')
ax.set_yscale('log'); ax.set_xlabel('Colour (B-V)'); ax.set_ylabel('V-band Luminosity (W)');
With that out of the way, we can start fitting some galaxies against this benchmark! I'll plot essentially what's happening.
MS_dist = np.zeros(len(GalaxyNames))
MS_unc = np.zeros(len(GalaxyNames))
for num, name in enumerate(GalaxyNames):
    galaxdata = pd.read_csv(datapath + f'/Star Clusters/{name}.txt', delimiter=' ')
    galaxVflux = np.array(galaxdata["GreenF"])
    galaxBflux = np.array(galaxdata["BlueF"])
    galaxBV = np.log10(galaxVflux / galaxBflux) # B-V colour index 
    
    offset = max(np.log10(localVlumin)) - max(np.log10(galaxVflux))
    
    dist_m = np.sqrt(10**offset / (4 * np.pi))
    dist_pc = dist_m / (3.086 * 10**16)
    
    dist_pc_unc = (dist_m**2 * np.log(10) * np.sqrt(2) * 0.01 / (4 * np.pi)) * 0.5 * dist_m / (dist_m**2 * (3.086*10**16))
    
    MS_dist[num] = dist_pc
    MS_unc[num] = dist_pc_unc
    
    if num == 20:
        fig, [ax, ax2] = plt.subplots(1, 2, figsize=(15, 6))
        fig.subplots_adjust(wspace=0)
        
        ax.scatter(galaxBV, galaxVflux, s=1.5, c='r')
        ax.scatter(BV, localVlumin, s=1.5, c='k')
        ax.set_yscale('log'); ax.set_xlabel('Colour (B-V)'); ax.set_ylabel('V-band Luminosity (W)');
        
        ax2.scatter(BV, localVlumin, s=1.5, c='k', label="Local Galaxy")
        ax2.scatter(galaxBV, galaxVflux * 10**offset, s=1.5, c='r', label=f"{GalaxyNames[num]}")
        ax2.yaxis.tick_right(); ax2.yaxis.set_label_position("right")
        ax2.set_yscale('log'); ax2.set_xlabel('Colour (B-V)')
        ax2.legend(loc='lower left')
In the above plots, we can see that we shift the galaxys values upwards (shown in red) to match the benchmark (shown in black). From this, we can also see that some data is no longer available to us - presumably stars that are too dim to resolve at this large distance.
Finally, we can add these distances and their uncertainties to the existing pandas DataFrame and save it to the directory. I'll print it, too, so that we can compare some of the values between Period-Luminosity fitting and Main-Sequence fitting.
distance_data['MS_distance'] = MS_dist
distance_data['MS_unc'] = MS_unc
distance_data.to_csv(datapath + '/Galaxy Distances.txt', index=None, sep=' ')
distance_data
| Name | PL_distance | PL_unc | MS_distance | MS_unc | |
|---|---|---|---|---|---|
| 0 | X000.7-Y065.4-N610 | 23166.527455 | 19449.126986 | 27581.906457 | 35.736754 | 
| 1 | X001.3-Y063.9-N1640 | 14820.848927 | 3695.503081 | 6415.994075 | 8.312943 | 
| 2 | X002.2-Y076.6-N613 | 6800.865201 | 650.921293 | 7548.523170 | 9.780314 | 
| 3 | X002.4-Y065.4-N781 | 23709.239488 | 5975.887750 | 24263.992024 | 31.437868 | 
| 4 | X002.6-Y057.3-N842 | 7480.182488 | 790.226449 | 7680.245139 | 9.950981 | 
| ... | ... | ... | ... | ... | ... | 
| 388 | X354.8-Y077.7-N708 | 19100.066343 | 2976.781075 | 23443.469776 | 30.374750 | 
| 389 | X355.3-Y074.1-N915 | 7200.860679 | 426632.290482 | 7583.454052 | 9.825573 | 
| 390 | X355.9-Y077.4-N898 | 5584.375784 | 567.988176 | 5820.374044 | 7.541222 | 
| 391 | X356.2-Y076.6-N842 | 18124.891216 | 2092.807586 | 19255.655329 | 24.948769 | 
| 392 | X356.7-Y081.1-N656 | 17509.515881 | 1830.320686 | 17060.299002 | 22.104336 | 
393 rows × 5 columns