vignettes/selecting_cells.Rmd
selecting_cells.Rmd
phenoptr
includes a flexible mechanism for selecting cells (i.e. rows) from a cell seg table. The mechanism is implemented in select_rows
. Row selection may be used directly via select_rows
and ordinary subsetting operations. It is also used indirectly by calling functions that support it, including count_touching_cells
and count_within
.
The return value from select_rows
is a boolean (logical) vector whose length is the number of rows of the given cell seg table. You use this returned value to select rows of the table.
The mechanism select_rows
uses to specify phenotypes is very flexible. This flexibility comes at a cost in complexity. Most common phenotype combinations can also be specified using parse_phenotypes
, which supports a friendlier syntax.
This tutorial uses sample_cell_seg_data
and count_within
to give examples of the phenotype specifications used by select_rows
.
The simplest selector is just the name of a single phenotype. This example selects the rows containing CK+
cells. The same syntax works with both parse_phenotypes
and select_rows
.
library(phenoptr)
csd <- sample_cell_seg_data
rows <- select_rows(csd, 'CK+')
sum(rows) # The number of selected rows
## [1] 2257
# Select just the desired rows by subsetting
ck <- csd[rows, ]
dim(ck)
## [1] 2257 199
This example counts CD8+
cells with 15 microns of CK+
cells.
dst <- distance_matrix(csd) # Compute this just once and re-use it
count_within(csd, from='CK+', to='CD8+', radius=15, dst=dst)
## # A tibble: 1 x 5
## radius from_count to_count from_with within_mean
## <dbl> <int> <int> <int> <dbl>
## 1 15 2257 228 193 0.115
Double positive (or more) cells can be selected by including multiple names in a list. Selectors in a list are combined with AND.
Multiple phenotypes may selected together by including each name in a character vector (not a list!). Names in a vector are combined with OR.
For example, to select cells phenotyped as either CD8+
or FoxP3+
, use the selector c('CD8+', 'FoxP3+')
.
This example selects this combination. Note the call to select_rows
has been combined with the subsetting of csd
.
tcells <- csd[select_rows(csd, c('CD8+', 'FoxP3+')), ]
dim(tcells)
## [1] 456 199
count_within(csd, from='CK+', to=c('CD8+', 'FoxP3+'), radius=15, dst=dst)
## # A tibble: 1 x 5
## radius from_count to_count from_with within_mean
## <dbl> <int> <int> <int> <dbl>
## 1 15 2257 456 354 0.206
This type of grouping is an either / or selection. The count_within
example above counts the number of T cells (CD8+
or FoxP3+
) within 15 microns of a CK+
cell. If you want separate counts for CD8+
and FoxP3+
, use count_within_batch
.
For more flexibility, select_rows
supports selection using any valid R expression. Expressions are written using one-sided formulas. The formulas are evaluated in the context of the cell seg table so they may reference any column of the table.
For example, to select cells with PDL1 expression greater than 3, use the expression ~`Entire Cell PDL1 (Opal 520) Mean`>3
. In this example, the column name is Entire Cell PDL1 (Opal 520) Mean
.
Expressions and phenotype names may be combined in a list. This example selects CK+
cells with PDL1 > 3.
rows <- select_rows(csd, list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3))
ck_pdl1 <- csd[rows, ]
dim(ck_pdl1)
## [1] 531 199
count_within(csd, from=list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3),
to='CD8+', radius=15, dst=dst)
## # A tibble: 1 x 5
## radius from_count to_count from_with within_mean
## <dbl> <int> <int> <int> <dbl>
## 1 15 531 228 86 0.228
A few things to note about formula expressions:
read_cell_seg_data(path, remove_units=TRUE)
(the default), the table names will be abbreviated compared to the names in the file.Several functions in phenoptr
operate on pairs of phenotypes and have arguments pairs
and phenotype_rules
. For example, see count_touching_cells
and spatial_distribution_report
. These functions build on select_rows
to allow allow flexible selection of pairs of phenotypes.
In the simplest usage, the names in pairs
are the names of phenotypes in the cell seg data. In this case, pairs
just lists the desired phenotypes. For example, to pair CK+
cells with CD8+
cells, use the argument
For a single pair, a list is not required so this can be simplified to
pairs <- c('CK+', 'CD8+')
For multiple pairs, list each pair separately. For example, to pair CK+
cells first with CD8+
cells and then with CD68+
cells, use the argument
You may want to define a new phenotype using grouping or expressions as shown in the “Selecting phenotypes” sections above. To do this, use the phenotype_rules
argument to associate a select_rows
rule with a name; then use the new name in the pairs argument.
For example, to create a T Cell
phenotype which matches CD8+
and FoxP3+
phenotypes, and pair it with a PDL1+ CK+
phenotype which applies a threshold to tumor cells, use these arguments:
pairs <- c('PDL1+ CK+', 'T Cell')
phenotype_rules <- list(
'PDL1+ CK+'=list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3),
'T Cell'=c('CD8+', 'FoxP3+'))
phenotype_rules
only needs to include phenotypes which are not in the cell seg data. For example, to extend the previous example to include a pairing from PDL1+ CK+
to CD68+
cells, where CD68+
is an existing phenotype, extend the pairs
argument without changing phenotype_rules
: