read_cell_seg_data makes it easier to use data from Akoya Biosciences' inForm program. It reads data files written by inForm 2.0 and later and does useful cleanup on the result.

read_cell_seg_data(
  path = NA,
  pixels_per_micron = getOption("phenoptr.pixels.per.micron"),
  remove_units = TRUE,
  col_select = NULL
)

Arguments

path

Path to the file to read, or NA to use a file chooser.

pixels_per_micron

Conversion factor to microns (default 2 pixels/micron, the resolution of 20x MSI fields taken on Vectra Polaris and Vectra 3.). Set to NA to skip conversion. Set to 'auto' to read from an associated component_data.tif file.

remove_units

If TRUE (default), remove the unit name from expression columns.

col_select

Optional column selection expression, may be

  • NULL - retain all columns

  • "phenoptrReports" - retain only columns needed by functions in the phenoptrReports package.

  • A quoted list of one or more selection expressions, like in dplyr::select() (see example).

Value

A tibblecontaining the cleaned-up data set.

Details

read_cell_seg_data reads both single-field tables, merged tables and consolidated tables and does useful cleanup on the data:

  • Removes columns that are all NA. These are typically unused summary columns.

  • Converts percent columns to numeric fractions.

  • Converts pixel distances to microns. The conversion factor may be specified as a parameter, by setting options(phenoptr.pixels.per.micron), or by reading an associated component_data.tif file.

  • Optionally removes units from expression names

  • If the file contains multiple sample names, a tag column is created containing a minimal, unique tag for each sample. This is useful when a short name is needed, for example in chart legends.

If pixels_per_micron='auto', read_cell_seg_data looks for a component_data.tif file in the same directory as path. If found, pixels_per_micron is read from the file and the cell coordinates are offset to the correct spatial location.

If col_select is "phenoptrReports", only columns normally needed by phenoptrReports are read. This can dramatically reduce the time to read a file and the memory required to store the results.

Specifically, passing col_select='phenoptrReports' will omit

  • Component stats other than mean expression

  • Shape stats other than area

  • Path, Processing Region ID, Category Region ID, Lab ID, Confidence, and columns which are normally blank.

See also

Examples

path <- sample_cell_seg_path()
csd <- read_cell_seg_data(path)

# count all the phenotypes in the data
table(csd$Phenotype)
#> 
#>  CD68+   CD8+    CK+ FoxP3+  other 
#>    417    228   2257    228   2942 

# Read only columns needed by phenoptrReports
csd <- read_cell_seg_data(path, col_select='phenoptrReports')

# Read only position and phenotype columns
csd <- read_cell_seg_data(path,
         col_select=rlang::quo(list(dplyr::contains('Position'),
                                    dplyr::contains('Phenotype'))))
#> Data is already in microns, no conversion performed
if (FALSE) {
# Use purrr::map_df to read all cell seg files in a directory
# and return a single tibble.
paths <- list_cell_seg_files(path)
csd <- purrr::map_df(paths, read_cell_seg_data)
}