Read and clean an inForm data file.

read_cell_seg_data makes it easier to use data from Akoya Biosciences' inForm program. It reads data files written by inForm 2.0 and later and does useful cleanup on the result.

read_cell_seg_data(
  path = NA,
  pixels_per_micron = getOption("phenoptr.pixels.per.micron"),
  remove_units = TRUE,
  col_select = NULL
)

Arguments

path

Path to the file to read, or NA to use a file chooser.

pixels_per_micron

Conversion factor to microns (default 2 pixels/micron, the resolution of 20x MSI fields taken on Vectra Polaris and Vectra 3.). Set to NA to skip conversion. Set to 'auto' to read from an associated component_data.tif file.

remove_units

If TRUE (default), remove the unit name from expression columns.

col_select

Optional column selection expression, may be

NULL - retain all columns
"phenoptrReports" - retain only columns needed by functions in the phenoptrReports package.
A quoted list of one or more selection expressions, like in dplyr::select() (see example).

Value

A tibblecontaining the cleaned-up data set.

Details

read_cell_seg_data reads both single-field tables, merged tables and consolidated tables and does useful cleanup on the data:

Removes columns that are all NA. These are typically unused summary columns.
Converts percent columns to numeric fractions.
Converts pixel distances to microns. The conversion factor may be specified as a parameter, by setting options(phenoptr.pixels.per.micron), or by reading an associated component_data.tif file.
Optionally removes units from expression names
If the file contains multiple sample names, a tag column is created containing a minimal, unique tag for each sample. This is useful when a short name is needed, for example in chart legends.

If pixels_per_micron='auto', read_cell_seg_data looks for a component_data.tif file in the same directory as path. If found, pixels_per_micron is read from the file and the cell coordinates are offset to the correct spatial location.

If col_select is "phenoptrReports", only columns normally needed by phenoptrReports are read. This can dramatically reduce the time to read a file and the memory required to store the results.

Specifically, passing col_select='phenoptrReports' will omit

Component stats other than mean expression
Shape stats other than area
Path, Processing Region ID, Category Region ID, Lab ID, Confidence, and columns which are normally blank.

Examples

path <- sample_cell_seg_path()
csd <- read_cell_seg_data(path)

# count all the phenotypes in the data
table(csd$Phenotype)
#> 
#>  CD68+   CD8+    CK+ FoxP3+  other 
#>    417    228   2257    228   2942 

# Read only columns needed by phenoptrReports
csd <- read_cell_seg_data(path, col_select='phenoptrReports')

# Read only position and phenotype columns
csd <- read_cell_seg_data(path,
         col_select=rlang::quo(list(dplyr::contains('Position'),
                                    dplyr::contains('Phenotype'))))
#> Data is already in microns, no conversion performed
if (FALSE) {
# Use purrr::map_df to read all cell seg files in a directory
# and return a single tibble.
paths <- list_cell_seg_files(path)
csd <- purrr::map_df(paths, read_cell_seg_data)
}

Arguments

Value

Details

See also

Examples