read_cell_seg_data makes it easier to use data from Akoya Biosciences'
inForm program. It reads data files written by inForm 2.0 and later and does
useful cleanup on the result.
read_cell_seg_data(
path = NA,
pixels_per_micron = getOption("phenoptr.pixels.per.micron"),
remove_units = TRUE,
col_select = NULL
)Path to the file to read, or NA to use a file chooser.
Conversion factor to microns
(default 2 pixels/micron, the resolution of 20x MSI fields
taken on Vectra Polaris and Vectra 3.).
Set to NA to skip conversion. Set to 'auto' to read from
an associated component_data.tif file.
If TRUE (default), remove the unit name from expression columns.
Optional column selection expression, may be
NULL - retain all columns
"phenoptrReports" - retain only columns needed by functions
in the phenoptrReports package.
A quoted list of one or more selection expressions,
like in dplyr::select() (see example).
A tibblecontaining the cleaned-up data set.
read_cell_seg_data reads both single-field tables, merged tables
and consolidated tables
and does useful cleanup on the data:
Removes columns that are all NA. These are typically unused summary columns.
Converts percent columns to numeric fractions.
Converts pixel distances to microns. The conversion factor may be
specified as a parameter, by setting
options(phenoptr.pixels.per.micron), or by reading an associated
component_data.tif file.
Optionally removes units from expression names
If the file contains multiple sample names,
a tag column is created
containing a minimal, unique tag for each sample.
This is useful when a
short name is needed, for example in chart legends.
If pixels_per_micron='auto', read_cell_seg_data looks for
a component_data.tif file in the same directory as path.
If found, pixels_per_micron is read from the file and
the cell coordinates are offset to the correct spatial location.
If col_select is "phenoptrReports", only columns normally needed by
phenoptrReports are read. This can dramatically reduce the time to
read a file and the memory required to store the results.
Specifically, passing col_select='phenoptrReports' will omit
Component stats other than mean expression
Shape stats other than area
Path, Processing Region ID, Category Region ID,
Lab ID, Confidence, and columns which are normally
blank.
Other file readers:
get_field_info(),
list_cell_seg_files(),
read_components(),
read_maps()
path <- sample_cell_seg_path()
csd <- read_cell_seg_data(path)
# count all the phenotypes in the data
table(csd$Phenotype)
#>
#> CD68+ CD8+ CK+ FoxP3+ other
#> 417 228 2257 228 2942
# Read only columns needed by phenoptrReports
csd <- read_cell_seg_data(path, col_select='phenoptrReports')
# Read only position and phenotype columns
csd <- read_cell_seg_data(path,
col_select=rlang::quo(list(dplyr::contains('Position'),
dplyr::contains('Phenotype'))))
#> Data is already in microns, no conversion performed
if (FALSE) {
# Use purrr::map_df to read all cell seg files in a directory
# and return a single tibble.
paths <- list_cell_seg_files(path)
csd <- purrr::map_df(paths, read_cell_seg_data)
}