SEP

Python library for Source Extraction and Photometry

About

Source Extractor (Bertin & Arnouts 1996) is a widely used command-line program for segmentation and analysis of astronomical images. It reads in FITS format files, performs a configurable series of tasks, including background estimation, source detection, deblending and a wide array of source measurements, and finally outputs a FITS format catalog file.

While Source Extractor is highly useful, the fact that it can only be used as an executable can limit its applicability or lead to awkward workflows. There is often a desire to have programmatic access to perform one or more of the above tasks on in-memory images as part of a larger custom analysis.

SEP makes the core algorithms of Source Extractor available as a library of stand-alone functions and classes. These operate directly on in-memory arrays (no FITS files or configuration files). The code is derived from the Source Extractor code base (written in C) and aims to produce results compatible with Source Extractor whenever possible. SEP consists of a C library with no dependencies outside the standard library, and a Python module that wraps the C library in a Pythonic API. The Python wrapper operates on NumPy arrays with NumPy as its only dependency. See below for language-specfic build and usage instructions.

Some features:

  • spatially variable background and noise estimation

  • source extraction, with on-the-fly convolution and source deblending

  • circular and elliptical aperture photometry

  • fast: implemented in C with Python bindings via Cython

Additional features not in Source Extractor:

  • Optimized matched filter for variable noise in source extraction.

  • Circular annulus and elliptical annulus aperture photometry functions.

  • Local background subtraction in shape consistent with aperture in aperture photometry functions.

  • Exact pixel overlap mode in all aperture photometry functions.

  • Masking of elliptical regions on images.

Installation

with conda

SEP can be installed with conda from the conda-forge channel:

conda install -c conda-forge sep

with pip

SEP can also be installed with pip. After ensuring that numpy is installed, run

pip install sep

If you get an error about permissions, you are probably using your system Python. In this case, I recommend using pip’s “user install” option to install sep into your user directory

pip install --user sep

Do not install sep or other third-party Python packages using sudo unless you are fully aware of the risks.

Usage Guide

Tutorial

This tutorial shows the basic steps of using SEP to detect objects in an image and perform some basic aperture photometry.

Here, we use the fitsio package, just to read the test image, but you can also use astropy.io.fits for this purpose (or any other FITS reader).

[1]:
import numpy as np
import sep
[2]:
# additional setup for reading the test image and displaying plots
import fitsio
import matplotlib.pyplot as plt
from matplotlib import rcParams

%matplotlib inline

rcParams['figure.figsize'] = [10., 8.]

First, we’ll read an example image from a FITS file and display it, just to show what we’re dealing with. The example image is just 256 x 256 pixels.

[3]:
# read image into standard 2-d numpy array
data = fitsio.read("../data/image.fits")
[4]:
# show the image
m, s = np.mean(data), np.std(data)
plt.imshow(data, interpolation='nearest', cmap='gray', vmin=m-s, vmax=m+s, origin='lower')
plt.colorbar();
_images/tutorial_5_0.png

Background subtraction

Most optical/IR data must be background subtracted before sources can be detected. In SEP, background estimation and source detection are two separate steps.

[5]:
# measure a spatially varying background on the image
bkg = sep.Background(data)

There are various options for controlling the box size used in estimating the background. It is also possible to mask pixels. For example:

bkg = sep.Background(data, mask=mask, bw=64, bh=64, fw=3, fh=3)

See the reference section for descriptions of these parameters.

This returns an Background object that holds information on the spatially varying background and spatially varying background noise level. We can now do various things with this Background object:

[6]:
# get a "global" mean and noise of the image background:
print(bkg.globalback)
print(bkg.globalrms)
6852.04931640625
65.46165466308594
[7]:
# evaluate background as 2-d array, same size as original image
bkg_image = bkg.back()
# bkg_image = np.array(bkg) # equivalent to above
[8]:
# show the background
plt.imshow(bkg_image, interpolation='nearest', cmap='gray', origin='lower')
plt.colorbar();
_images/tutorial_11_0.png
[9]:
# evaluate the background noise as 2-d array, same size as original image
bkg_rms = bkg.rms()
[10]:
# show the background noise
plt.imshow(bkg_rms, interpolation='nearest', cmap='gray', origin='lower')
plt.colorbar();
_images/tutorial_13_0.png
[11]:
# subtract the background
data_sub = data - bkg

One can also subtract the background from the data array in-place by doing bkg.subfrom(data).

Warning:

If the data array is not background-subtracted or the threshold is too low, you will tend to get one giant object when you run object detection using sep.extract. Or, more likely, an exception will be raised due to exceeding the internal memory constraints of the sep.extract function.

Object detection

Now that we’ve subtracted the background, we can run object detection on the background-subtracted data. You can see the background noise level is pretty flat. So here we’re setting the detection threshold to be a constant value of \(1.5 \sigma\) where \(\sigma\) is the global background RMS.

[12]:
objects = sep.extract(data_sub, 1.5, err=bkg.globalrms)

sep.extract has many options for controlling detection threshold, pixel masking, filtering, and object deblending. See the reference documentation for details.

objects is a NumPy structured array with many fields.

[13]:
# how many objects were detected
len(objects)
[13]:
69

objects['x'] and objects['y'] will give the centroid coordinates of the objects. Just to check where the detected objects are, we’ll over-plot the object coordinates with some basic shape parameters on the image:

[14]:
from matplotlib.patches import Ellipse

# plot background-subtracted image
fig, ax = plt.subplots()
m, s = np.mean(data_sub), np.std(data_sub)
im = ax.imshow(data_sub, interpolation='nearest', cmap='gray',
               vmin=m-s, vmax=m+s, origin='lower')

# plot an ellipse for each object
for i in range(len(objects)):
    e = Ellipse(xy=(objects['x'][i], objects['y'][i]),
                width=6*objects['a'][i],
                height=6*objects['b'][i],
                angle=objects['theta'][i] * 180. / np.pi)
    e.set_facecolor('none')
    e.set_edgecolor('red')
    ax.add_artist(e)
_images/tutorial_21_0.png

objects has many other fields, giving information such as second moments, and peak pixel positions and values. See the reference documentation for sep.extract for descriptions of these fields. You can see the available fields:

[15]:
# available fields
objects.dtype.names
[15]:
('thresh',
 'npix',
 'tnpix',
 'xmin',
 'xmax',
 'ymin',
 'ymax',
 'x',
 'y',
 'x2',
 'y2',
 'xy',
 'errx2',
 'erry2',
 'errxy',
 'a',
 'b',
 'theta',
 'cxx',
 'cyy',
 'cxy',
 'cflux',
 'flux',
 'cpeak',
 'peak',
 'xcpeak',
 'ycpeak',
 'xpeak',
 'ypeak',
 'flag')

Aperture photometry

Finally, we’ll perform simple circular aperture photometry with a 3 pixel radius at the locations of the objects:

[16]:
flux, fluxerr, flag = sep.sum_circle(data_sub, objects['x'], objects['y'],
                                     3.0, err=bkg.globalrms, gain=1.0)

flux, fluxerr and flag are all 1-d arrays with one entry per object.

[17]:
# show the first 10 objects results:
for i in range(10):
    print("object {:d}: flux = {:f} +/- {:f}".format(i, flux[i], fluxerr[i]))
object 0: flux = 2249.173164 +/- 291.027422
object 1: flux = 3092.230000 +/- 291.591821
object 2: flux = 5949.882168 +/- 356.561539
object 3: flux = 1851.435000 +/- 295.028419
object 4: flux = 72736.400605 +/- 440.171830
object 5: flux = 3860.762324 +/- 352.162684
object 6: flux = 6418.924336 +/- 357.458504
object 7: flux = 2210.745605 +/- 350.790787
object 8: flux = 2741.598848 +/- 352.277244
object 9: flux = 20916.877324 +/- 376.965683

Finally a brief word on byte order

Note:

If you are using SEP to analyze data read from FITS files with astropy.io.fits you may see an error message such as:

ValueError: Input array with dtype '>f4' has non-native byte order.
Only native byte order arrays are supported. To change the byte
order of the array 'data', do 'data = data.byteswap().newbyteorder()'

It is usually easiest to do this byte-swap operation directly after reading the array from the FITS file. You can even perform the byte swap in-place by doing

>>> data = data.byteswap(inplace=True).newbyteorder()

If you do this in-place operation, ensure that there are no other references to data, as they will be rendered nonsensical.

For the interested reader, this byteswap operation is necessary because astropy.io.fits always returns big-endian byte order arrays, even on little-endian machines. For more on this, see

Matched Filter (Convolution)

For source detection, SEP supports using a matched filter, which can give the optimal detection signal-to-noise for objects with some known shape. This is controlled using the filter_kernel keyword in sep.extract. For example:

kernel = np.array([[1., 2., 3., 2., 1.],
                   [2., 3., 5., 3., 2.],
                   [3., 5., 8., 5., 3.],
                   [2., 3., 5., 3., 2.],
                   [1., 2., 3., 2., 1.]])
objects = sep.extract(data, thresh, filter_kernel=kernel)

If filter_kernel is not specified, a default 3-by-3 kernel is used. To disable filtering entirely, specify filter_kernel=None.

What array should be used for filter_kernel? It should be approximately the shape of the objects you are trying to detect. For example, to optimize for the detection of point sources, filter_kernel should be set to shape of the point spread function (PSF) in the data. For galaxy detection, a larger kernel could be used. In practice, anything that is roughly the shape of the desired object works well since the main goal is to negate the effects of background noise, and a reasonable estimate is good enough.

Correct treatment in the presence of variable noise

In Source Extractor, the matched filter is implemented assuming there is equal noise across all pixels in the kernel. The matched filter then simplifies to a convolution of the data with the kernel. In sep.extract, this is also the behavior when there is constant noise (when err is not specified).

In the presence of independent noise on each pixel, SEP uses a full matched filter implementation that correctly accounts for the noise in each pixel. This is not available in Source Extractor. Some benefits of this method are that detector sensitivity can be taken into account and edge effects are handled gracefully. For example, suppose we have an image with noise that is higher in one region than another. This can often occur when coadding images:

# create a small image with higher noise in the upper left
n = 16
X, Y = np.meshgrid(np.arange(n), np.arange(n))
mask = Y > X
error = np.ones((n, n))
error[mask] = 4.0
data = error * np.random.normal(size=(n, n))

# add source to middle of data
source = 3.0 * np.array([[1., 2., 1.],
                         [2., 4., 2.],
                         [1., 2., 1.]])
m = n // 2 - 1
data[m:m+3, m:m+3] += source

plt.imshow(data, interpolation='nearest', origin='lower', cmap='bone')
_images/matched_filter_example.png

Specifying filter_type='conv' will use simple convolution, matching the behavior of Source Extractor. The object is not detected:

>>> objects = sep.extract(data, 3.0, err=error, filter_type='conv')
>>> len(objects)
0

Setting filter_type='matched' (the default) correctly deweights the noisier pixels around the source and detects the object:

>>> objects = sep.extract(data, 3.0, err=error, filter_type='matched')
>>> len(objects)
1

Derivation of the matched filter formula

Assume that we have an image containing a single point source. This produces a signal with PSF \(S_i\) and noise \(N_i\) at each pixel indexed by \(i\). Then the measured image data \(D_i\) (i.e. our pixel values) is given by:

\[D_i = S_i + N_i\]

Then we want to apply a linear transformation \(T_i\) which gives an output \(Y\):

\[Y = \sum_i T_i D_i = T^T D\]

We use matrix notation from here on and drop the explicit sums. Our objective is to find the transformation \(T_i\) which maximizes the signal-to-noise ratio \(SNR\).

\[SNR^2 = \frac{(T^T S)^2}{E[(T^T N)^2]}\]

We can expand the denominator as:

\[E[(T^T N)^2] = E[(T^T N)(N^T T)] = T^T \cdot E[N N^T] \cdot T = T^T C T\]

Where \(C_{ik}\) is the covariance of the noise between pixels \(i\) and \(k\). Now using the Cauchy-Schwarz inequality on the numerator:

\[(T^T S)^2 = (T^T C^{1/2} C^{-1/2} S)^2 \le (T^T C^{1/2})^2 (C^{-1/2} S)^2 = (T^T C T) (S^T C^{-1} S)\]

since \(C^T = C\). The signal-to-noise ratio is therefore bounded by:

\[\begin{split}&SNR^2 \le \frac{(T^T C T)(S^T C^{-1} S)}{(T^T C T)} \\ &SNR^2 \le S^T C^{-1} S\end{split}\]

Choosing \(T = \alpha C^{-1} S\) where \(\alpha\) is an arbitrary normalization constant, we get equality. Hence this choise of \(T\) is the optimal linear tranformation. We normalize this linear transformation so that if there is no signal and only noise, we get an expected signal-to-noise ratio of 1. With this definition, the output \(SNR\) represents the number of standard deviations above the background. This gives:

\[\begin{split}&E[(T^T N)^2] = T^T C T = \alpha^2 S^T C^{-1} C C^{-1} S = \alpha^2 S^T C^{-1} S = 1 \\ &\alpha = \frac{1}{\sqrt{S^T C^{-1} S}}\end{split}\]

Putting everything together, our normalized linear transformation is:

\[T = \frac{C^{-1} S}{\sqrt{S^T C^{-1} S}}\]

And the optimal signal-to-noise is given in terms of the known variables as:

\[SNR = \frac{S^T C^{-1} D}{\sqrt{S^T C^{-1} S}}\]

Aperture photometry

There are four aperture functions available:

Function

Sums data within…

sep.sum_circle

circle(s)

sep.sum_circann

circular annulus/annuli

sep.sum_ellipse

ellipse(s)

sep.sum_ellipann

elliptical annulus/annuli

The follow examples demonstrate options for circular aperture photometry. The other functions behave similarly.

# sum flux in circles of radius=3.0
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0)

# x, y and r can be arrays and obey numpy broadcasting rules.
# Here, r is an array instead of a single number:
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'],
                                     3.0 * np.ones(len(objs)))

# use a different subpixel sampling (default is 5; 0 means "exact")
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     subpix=0)

Error calculation

In the default modes illustrated above, the uncertainty fluxerr is not calculated and values of 0 are simply returned. The uncertainty can be flexibly and efficiently calculated, depending on the characteristics of your data. The presence of an err or var keyword indicates a per-pixel noise, while the presense of a gain keyword indicates that the Poisson uncertainty on the total sum should be included. Some illustrative examples:

# Specify a per-pixel "background" error and a gain. This is suitable
# when the data have been background subtracted.
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     err=bkg.globalrms, gain=1.0)

# Variance can be passed instead of error, with identical results.
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     var=bkg.globalrms**2, gain=1.0)

# Error or variance can be arrays, indicating that the background error
# is variable.
bkgrms = bkg.rms()  # array, same shape as data
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     err=bkgrms, gain=1.0)

# If your uncertainty array already includes Poisson noise from the object,
# leave gain as None (default):
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     err=error_array)

# If your data represent raw counts (it is not background-subtracted),
# set only gain to get the poisson error:
flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     gain=1.0)

The error is calculated as

\[\sigma_F^2 = \sum_i \sigma_i^2 + F/g\]

where the sum is over pixels in the aperture, \(\sigma_i\) is the noise in each pixel, \(F\) is the sum in the aperture and \(g\) is the gain. The last term is not added if gain is None.

Masking

Apply a mask (same shape as data). Pixels where the mask is True are “corrected” to the average value within the aperture.

flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     mask=mask)

Local background subtraction

The sum_circle and sum_ellipse functions have options for performing local background subtraction. For example, to subtract the background calculated in an annulus between 6 and 8 pixel radius:

flux, fluxerr, flag = sep.sum_circle(data, objs['x'], objs['y'], 3.0,
                                     mask=mask, bkgann=(6., 8.))

Pixels in the background annulus are not subsampled and any masked pixels in the annulus are completely igored rather than corrected. The inner and outer radii can also be arrays. The error in the background is included in the reported error.

Equivalent of FLUX_AUTO (e.g., MAG_AUTO) in Source Extractor

This is a two-step process. First we calculate the Kron radius for each object, then we perform elliptical aperture photometry within that radius:

kronrad, krflag = sep.kron_radius(data, x, y, a, b, theta, 6.0)
flux, fluxerr, flag = sep.sum_ellipse(data, x, y, a, b, theta, 2.5*kronrad,
                                      subpix=1)
flag |= krflag  # combine flags into 'flag'

This specific example is the equilvalent of setting PHOT_AUTOPARAMS 2.5, 0.0 in Source Extractor (note the 2.5 in the argument to sep.sum_ellipse). The second Source Extractor parameter is a minimum diameter. To replicate Source Extractor behavior for non-zero values of the minimum diameter, one would put in logic to use circular aperture photometry if the Kron radius is too small. For example:

r_min = 1.75  # minimum diameter = 3.5
use_circle = kronrad * np.sqrt(a * b) < r_min
cflux, cfluxerr, cflag = sep.sum_circle(data, x[use_circle], y[use_circle],
                                        r_min, subpix=1)
flux[use_circle] = cflux
fluxerr[use_circle] = cfluxerr
flag[use_circle] = cflag

Equivalent of FLUX_RADIUS in Source Extractor

In Source Extractor, the FLUX_RADIUS parameter gives the radius of a circle enclosing a desired fraction of the total flux. For example, with the setting PHOT_FLUXFRAC 0.5, FLUX_RADIUS will give the radius of a circle containing half the “total flux” of the object. For the definition of “total flux”, Source Extractor uses its measurement of FLUX_AUTO, which is taken through an elliptical aperture (see above). Thus, with the setting PHOT_FLUXFRAC 1.0, you would find the circle containing the same flux as whatever ellipse Source Extractor used for FLUX_AUTO.

Given a previous calculation of flux as above, calculate the radius for a flux fraction of 0.5:

r, flag = sep.flux_radius(data, objs['x'], objs['y'], 6.*objs['a'], 0.5,
                          normflux=flux, subpix=5)

And for multiple flux fractions:

r, flag = sep.flux_radius(data, objs['x'], objs['y'], 6.*objs['a'],
                          [0.5, 0.6], normflux=flux, subpix=5)

Equivalent of XWIN_IMAGE, YWIN_IMAGE in Source Extractor

Source Extractor’s XWIN_IMAGE, YWIN_IMAGE parameters can be used for more accurate object centroids than the default X_IMAGE, Y_IMAGE. Here, the winpos function provides this behavior. To match Source Extractor exactly, the right sig parameter (giving a description of the effective width) must be used for each object. Source Extractor uses 2.  / 2.35 * (half-light radius) where the half-light radius is calculated using flux_radius with a fraction of 0.5 and a normalizing flux of FLUX_AUTO. The equivalent here is:

sig = 2. / 2.35 * r  # r from sep.flux_radius() above, with fluxfrac = 0.5
xwin, ywin, flag = sep.winpos(data, objs['x'], objs['y'], sig)

Segmentation-masked image measurements

SourceExtractor provides a mechanism for measuring the “AUTO” and “FLUX_RADIUS” parameters for a given object including a mask for neighboring sources. In addition to the mask, setting the SourceExtractor parameter MASK_TYPE=CORRECT further fills the masked pixels of a given source with “good” pixel values reflected opposite of the masked pixels. The SEP photometry and measurement functions provide an option for simple masking without reflection or subtracting neighbor flux.

For example, using a segmentation array provided by sep.extract, we can compute the masked flux_radius that could otherwise be artificially large due to flux from nearby sources:

# list of object id numbers that correspond to the segments
seg_id = np.arange(1, len(objs)+1, dtype=np.int32)

r, flag = sep.flux_radius(data, objs['x'], objs['y'], 6.*objs['a'], 0.5,
                          seg_id=seg_id, seg=seg,
                          normflux=flux, subpix=5)

To enforce that a given measurement only includes pixels within a segment, provide negative values in the seg_id list. Otherwise the mask for a given object will be pixels with (seg == 0) | (seg_id == id_i).

The following functions include the segmentation masking: sum_circle, sum_circann, sum_ellipse, sum_ellipann, flux_radius , and kron_radius (winpos currently does not).

Masking image regions

Create a boolean array with elliptical regions set to True:

mask = np.zeros(data.shape, dtype=np.bool)
sep.mask_ellipse(mask, objs['x'], objs['y'], obs['a'], objs['b'],
                 objs['theta'], r=3.)

Reference/API

Background estimation & source detection

sep.Background(data[, mask, maskthresh, bw, ...])

Representation of spatially variable image background and noise.

sep.extract(data, thresh[, err, mask, ...])

Extract sources from an image.

Aperture photometry

sep.sum_circle(data, x, y, r[, err, var, ...])

Sum data in circular aperture(s).

sep.sum_circann(data, x, y, rin, rout[, ...])

Sum data in circular annular aperture(s).

sep.sum_ellipse(data, x, y, a, b, theta, r)

Sum data in elliptical aperture(s).

sep.sum_ellipann(data, x, y, a, b, theta, ...)

Sum data in elliptical annular aperture(s).

Aperture utilities

sep.kron_radius(data, x, y, a, b, theta, r)

Calculate Kron "radius" within an ellipse.

sep.flux_radius(data, x, y, rmax, frac[, ...])

Return radius of a circle enclosing requested fraction of total flux.

sep.winpos(data, xinit, yinit, sig[, mask, ...])

Calculate more accurate object centroids using 'windowed' algorithm.

sep.mask_ellipse(arr, x, y, a, b, theta[, r])

Mask ellipse(s) in an array.

sep.ellipse_axes(cxx, cyy, cxy)

Convert from coefficient ellipse representation to ellipse axes and angle.

sep.ellipse_coeffs(a, b, theta)

Convert from ellipse axes and angle to coefficient representation.

Low-level utilities

sep.get_extract_pixstack()

Get the size in pixels of the internal pixel buffer used in extract().

sep.set_extract_pixstack(size)

Set the size in pixels of the internal pixel buffer used in extract().

sep.get_sub_object_limit()

Get the limit on the number of sub-objects when deblending in extract().

sep.set_sub_object_limit(limit)

Set the limit on the number of sub-objects when deblending in extract().

Flags

Flag

Description

sep.OBJ_MERGED

object is result of deblending

sep.OBJ_TRUNC

object is truncated at image boundary

sep.OBJ_SINGU

x, y fully correlated in object

sep.APER_TRUNC

aperture truncated at image boundary

sep.APER_HASMASKED

aperture contains one or more masked pixels

sep.APER_ALLMASKED

aperture contains only masked pixels

sep.APER_NONPOSITIVE

aperture sum is negative in kron_radius

To see if a given flag is set in flags:

is_merged = (flags & sep.OBJ_MERGED) != 0

Note

The coordinate convention in SEP is that (0, 0) corresponds to the center of the first element of the data array. This agrees with the 0-based indexing in Python and C. However, note that this differs from the FITS convention where the center of the first element is at coordinates (1, 1). As Source Extractor deals with FITS files, its outputs follow the FITS convention. Thus, the coordinates from SEP will be offset from Source Extractor coordinates by -1 in x and y.

For complete API documentation, see Reference/API.

Contributing

Report a bug or documentation issue: http://github.com/kbarbary/sep/issues

Development of SEP takes place on GitHub at http://github.com/kbarbary/sep. Contributions of bug fixes, documentation improvements and minor feature additions are welcome via GitHub pull requests. For major features, it is best to open an issue discussing the change first.

License and Citation

The license for SEP is the Lesser GNU Public License (LGPL), granted with the permission from the original author of Source Extractor.

If you use SEP in a publication, please cite Barbary (2016) and the original Source Extractor paper: Bertin & Arnouts 1996.

The DOI for the sep v1.0.0 code release is 10.5281/zenodo.159035.