read_cell_seg_data
makes it easier to use data from Akoya Biosciences'
inForm program. It reads data files written by inForm 2.0 and later and does
useful cleanup on the result.
read_cell_seg_data(
path = NA,
pixels_per_micron = getOption("phenoptr.pixels.per.micron"),
remove_units = TRUE,
col_select = NULL
)
Path to the file to read, or NA to use a file chooser.
Conversion factor to microns
(default 2 pixels/micron, the resolution of 20x MSI fields
taken on Vectra Polaris and Vectra 3.).
Set to NA to skip conversion. Set to 'auto'
to read from
an associated component_data.tif
file.
If TRUE (default), remove the unit name from expression columns.
Optional column selection expression, may be
NULL - retain all columns
"phenoptrReports"
- retain only columns needed by functions
in the phenoptrReports
package.
A quoted list of one or more selection expressions,
like in dplyr::select()
(see example).
A tibble
containing the cleaned-up data set.
read_cell_seg_data
reads both single-field tables, merged tables
and consolidated tables
and does useful cleanup on the data:
Removes columns that are all NA. These are typically unused summary columns.
Converts percent columns to numeric fractions.
Converts pixel distances to microns. The conversion factor may be
specified as a parameter, by setting
options(phenoptr.pixels.per.micron)
, or by reading an associated
component_data.tif
file.
Optionally removes units from expression names
If the file contains multiple sample names,
a tag
column is created
containing a minimal, unique tag for each sample.
This is useful when a
short name is needed, for example in chart legends.
If pixels_per_micron='auto'
, read_cell_seg_data
looks for
a component_data.tif
file in the same directory as path
.
If found, pixels_per_micron
is read from the file and
the cell coordinates are offset to the correct spatial location.
If col_select
is "phenoptrReports"
, only columns normally needed by
phenoptrReports
are read. This can dramatically reduce the time to
read a file and the memory required to store the results.
Specifically, passing col_select='phenoptrReports'
will omit
Component stats other than mean expression
Shape stats other than area
Path
, Processing Region ID
, Category Region ID
,
Lab ID
, Confidence
, and columns which are normally
blank.
Other file readers:
get_field_info()
,
list_cell_seg_files()
,
read_components()
,
read_maps()
path <- sample_cell_seg_path()
csd <- read_cell_seg_data(path)
# count all the phenotypes in the data
table(csd$Phenotype)
#>
#> CD68+ CD8+ CK+ FoxP3+ other
#> 417 228 2257 228 2942
# Read only columns needed by phenoptrReports
csd <- read_cell_seg_data(path, col_select='phenoptrReports')
# Read only position and phenotype columns
csd <- read_cell_seg_data(path,
col_select=rlang::quo(list(dplyr::contains('Position'),
dplyr::contains('Phenotype'))))
#> Data is already in microns, no conversion performed
if (FALSE) {
# Use purrr::map_df to read all cell seg files in a directory
# and return a single tibble.
paths <- list_cell_seg_files(path)
csd <- purrr::map_df(paths, read_cell_seg_data)
}