Selecting cells within a cell segmentation table

phenoptr includes a flexible mechanism for selecting cells (i.e. rows) from a cell seg table. The mechanism is implemented in select_rows. Row selection may be used directly via select_rows and ordinary subsetting operations. It is also used indirectly by calling functions that support it, including count_touching_cells and count_within.

The return value from select_rows is a boolean (logical) vector whose length is the number of rows of the given cell seg table. You use this returned value to select rows of the table.

The mechanism select_rows uses to specify phenotypes is very flexible. This flexibility comes at a cost in complexity. Most common phenotype combinations can also be specified using parse_phenotypes, which supports a friendlier syntax.

Selecting phenotypes

This tutorial uses sample_cell_seg_data and count_within to give examples of the phenotype specifications used by select_rows.

Select a single phenotype

The simplest selector is just the name of a single phenotype. This example selects the rows containing CK+ cells. The same syntax works with both parse_phenotypes and select_rows.

library(phenoptr)

csd <- sample_cell_seg_data
rows <- select_rows(csd, 'CK+')
sum(rows) # The number of selected rows

## [1] 2257

# Select just the desired rows by subsetting
ck <- csd[rows, ]
dim(ck)

## [1] 2257  199

This example counts CD8+ cells with 15 microns of CK+ cells.

dst <- distance_matrix(csd) # Compute this just once and re-use it
count_within(csd, from='CK+', to='CD8+', radius=15, dst=dst)

## # A tibble: 1 x 5
##   radius from_count to_count from_with within_mean
##    <dbl>      <int>    <int>     <int>       <dbl>
## 1     15       2257      228       193       0.115

Select multiple positivity

Double positive (or more) cells can be selected by including multiple names in a list. Selectors in a list are combined with AND.

Group multiple phenotypes

Multiple phenotypes may selected together by including each name in a character vector (not a list!). Names in a vector are combined with OR.

For example, to select cells phenotyped as either CD8+ or FoxP3+, use the selector c('CD8+', 'FoxP3+').

This example selects this combination. Note the call to select_rows has been combined with the subsetting of csd.

tcells <- csd[select_rows(csd, c('CD8+', 'FoxP3+')), ]
dim(tcells)

## [1] 456 199

count_within(csd, from='CK+', to=c('CD8+', 'FoxP3+'), radius=15, dst=dst)

## # A tibble: 1 x 5
##   radius from_count to_count from_with within_mean
##    <dbl>      <int>    <int>     <int>       <dbl>
## 1     15       2257      456       354       0.206

This type of grouping is an either / or selection. The count_within example above counts the number of T cells (CD8+ or FoxP3+) within 15 microns of a CK+ cell. If you want separate counts for CD8+ and FoxP3+, use count_within_batch.

Flexible selection using expressions

For more flexibility, select_rows supports selection using any valid R expression. Expressions are written using one-sided formulas. The formulas are evaluated in the context of the cell seg table so they may reference any column of the table.

For example, to select cells with PDL1 expression greater than 3, use the expression ~`Entire Cell PDL1 (Opal 520) Mean`>3. In this example, the column name is Entire Cell PDL1 (Opal 520) Mean.

Expressions and phenotype names may be combined in a list. This example selects CK+ cells with PDL1 > 3.

rows <- select_rows(csd, list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3))
ck_pdl1 <- csd[rows, ]
dim(ck_pdl1)

## [1] 531 199

count_within(csd, from=list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3), 
             to='CD8+', radius=15, dst=dst)

## # A tibble: 1 x 5
##   radius from_count to_count from_with within_mean
##    <dbl>      <int>    <int>     <int>       <dbl>
## 1     15        531      228        86       0.228

A few things to note about formula expressions:

Expressions are evaluated in the context of the cell seg table. Names used in the expression must match column names in the table. If the table was read using read_cell_seg_data(path, remove_units=TRUE) (the default), the table names will be abbreviated compared to the names in the file.
Names which are not valid R symbol names—this includes most of the column names in a cell seg table—must be enclosed in backticks (`) as in the example above.

Selecting pairs of phenotypes

Several functions in phenoptr operate on pairs of phenotypes and have arguments pairs and phenotype_rules. For example, see count_touching_cells and spatial_distribution_report. These functions build on select_rows to allow allow flexible selection of pairs of phenotypes.

Pairs of existing phenotypes

In the simplest usage, the names in pairs are the names of phenotypes in the cell seg data. In this case, pairs just lists the desired phenotypes. For example, to pair CK+ cells with CD8+ cells, use the argument

pairs <- list(c('CK+', 'CD8+'))

For a single pair, a list is not required so this can be simplified to

pairs <- c('CK+', 'CD8+')

For multiple pairs, list each pair separately. For example, to pair CK+ cells first with CD8+ cells and then with CD68+ cells, use the argument

pairs <- list(c('CK+', 'CD8+'),
             c('CK+', 'CD68+'))

Defining new phenotypes

You may want to define a new phenotype using grouping or expressions as shown in the “Selecting phenotypes” sections above. To do this, use the phenotype_rules argument to associate a select_rows rule with a name; then use the new name in the pairs argument.

For example, to create a T Cell phenotype which matches CD8+ and FoxP3+ phenotypes, and pair it with a PDL1+ CK+ phenotype which applies a threshold to tumor cells, use these arguments:

pairs <- c('PDL1+ CK+', 'T Cell')
phenotype_rules <- list(
  'PDL1+ CK+'=list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3),
  'T Cell'=c('CD8+', 'FoxP3+'))

phenotype_rules only needs to include phenotypes which are not in the cell seg data. For example, to extend the previous example to include a pairing from PDL1+ CK+ to CD68+ cells, where CD68+ is an existing phenotype, extend the pairs argument without changing phenotype_rules:

pairs <- list(
  c('PDL1+ CK+', 'T Cell'),
  c('PDL1+ CK+', 'CD68+'))
phenotype_rules <- list(
  'PDL1+ CK+'=list('CK+', ~`Entire Cell PDL1 (Opal 520) Mean`>3),
  'T Cell'=c('CD8+', 'FoxP3+')
)

Kent Johnson

2022-01-03