Skip to contents

This function creates a list in which your settings, the raw counts and normalised counts are stored, using the result from a call to load_rcc().

Usage

normalise(
  nacho_object,
  housekeeping_genes = nacho_object[["housekeeping_genes"]],
  housekeeping_predict = nacho_object[["housekeeping_predict"]],
  housekeeping_norm = nacho_object[["housekeeping_norm"]],
  normalisation_method = nacho_object[["normalisation_method"]],
  n_comp = nacho_object[["n_comp"]],
  remove_outliers = nacho_object[["remove_outliers"]],
  outliers_thresholds = nacho_object[["outliers_thresholds"]]
)

Arguments

nacho_object

[list] A list object of class "nacho" obtained from load_rcc() or normalise().

housekeeping_genes

[character] A vector of names of the miRNAs/mRNAs that should be used as housekeeping genes. Default is NULL.

housekeeping_predict

[logical] Boolean to indicate whether the housekeeping genes should be predicted (TRUE) or not (FALSE). Default is FALSE.

housekeeping_norm

[logical] Boolean to indicate whether the housekeeping normalisation should be performed. Default is TRUE.

normalisation_method

[character] Either "GEO" or "GLM". Character string to indicate normalisation using the geometric mean ("GEO") or a generalized linear model ("GLM"). Default is "GEO".

n_comp

[numeric] Number indicating the number of principal components to compute. Cannot be more than n-1 samples. Default is 10.

remove_outliers

[logical] A boolean to indicate if outliers should be excluded.

outliers_thresholds

[list] List of thresholds to exclude outliers.

Value

[list] A list containing parameters and data.

access

[character] Value passed to load_rcc() in id_colname.

housekeeping_genes

[character] Value passed to load_rcc() or normalise().

housekeeping_predict

[logical] Value passed to load_rcc().

housekeeping_norm

[logical] Value passed to load_rcc() or normalise().

normalisation_method

[character] Value passed to load_rcc() or normalise().

remove_outliers

[logical] Value passed to normalise().

n_comp

[numeric] Value passed to load_rcc().

data_directory

[character] Value passed to load_rcc().

pc_sum

[data.frame] A data.frame with n_comp rows and four columns: "Standard deviation", "Proportion of Variance", "Cumulative Proportion" and "PC".

nacho

[data.frame] A data.frame with all columns from the sample sheet ssheet_csv and all computed columns, i.e., quality-control metrics and counts, with one sample per row.

outliers_thresholds

[list] A list of the quality-control thresholds used.

raw_counts

[data.frame] Raw counts with probes as rows and samples as columns. With "CodeClass" (first column), the type of the probes and "Name" (second column), the Name of the probes.

normalised_counts

[data.frame] Normalised counts with probes as rows and samples as columns. With "CodeClass" (first column)), the type of the probes and "Name" (second column), the name of the probes.

Details

Outliers definition (remove_outliers = TRUE):

  • Binding Density (BD) < 0.1

  • Binding Density (BD) > 2.25

  • Field of View (FoV) < 75

  • Positive Control Linearity (PCL) < 0.95

  • Limit of Detection (LoD) < 2

  • Positive normalisation factor (Positive_factor) < 0.25

  • Positive normalisation factor (Positive_factor) > 4

  • Housekeeping normalisation factor (house_factor) < 1/11

  • Housekeeping normalisation factor (house_factor) > 11

Examples


data(GSE74821)
GSE74821_norm <- normalise(
  nacho_object = GSE74821,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO",
  remove_outliers = TRUE
)
#> [NACHO] Normalising "GSE74821" with new value for parameters:
#>   - normalisation_method = TRUE
#>   - remove_outliers = TRUE
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list

if (interactive()) {
  library(GEOquery)
  library(NACHO)

  # Import data from GEO
  gse <- GEOquery::getGEO(GEO = "GSE74821")
  targets <- Biobase::pData(Biobase::phenoData(gse[[1]]))
  GEOquery::getGEOSuppFiles(GEO = "GSE74821", baseDir = tempdir())
  utils::untar(
    tarfile = file.path(tempdir(), "GSE74821", "GSE74821_RAW.tar"),
    exdir = file.path(tempdir(), "GSE74821")
  )
  targets$IDFILE <- list.files(
    path = file.path(tempdir(), "GSE74821"),
    pattern = ".RCC.gz$"
  )
  targets[] <- lapply(X = targets, FUN = iconv, from = "latin1", to = "ASCII")
  utils::write.csv(
    x = targets,
    file = file.path(tempdir(), "GSE74821", "Samplesheet.csv")
  )

  # Read RCC files and format
  nacho <- load_rcc(
    data_directory = file.path(tempdir(), "GSE74821"),
    ssheet_csv = file.path(tempdir(), "GSE74821", "Samplesheet.csv"),
    id_colname = "IDFILE"
  )

  # (re)Normalise data by removing outliers
  nacho_norm <- normalise(
    nacho_object = nacho,
    remove_outliers = TRUE
  )

  # (re)Normalise data with "GLM" method and removing outliers
  nacho_norm <- normalise(
    nacho_object = nacho,
    normalisation_method = "GLM",
    remove_outliers = TRUE
  )
}