Compute quality-control of genotyping array (PLINK format) using a rmarkdown template.

qc_plink(input_directory = NULL, output_directory = NULL,
  cohort_name = "CARoT", output_file = paste0(cohort_name, "_QC.html"),
  array = NULL, callrate_samples = 0.95, callrate_snps = 0.95,
  heterozygosity_treshold = 4, maf_threshold = 0.01,
  hwe_pvalue = 1e-04, includes_relatives = FALSE,
  mendelian_samples = 0.05, mendelian_snp = 0.1, IBD_threshold = 0.2,
  population = NULL, pca_components = 10, pca_threshold = 3,
  check_bim_script = system.file("perl", "HRC-1000G-check-bim.pl",
  package = "CARoT"), ref1kg_panel = NULL, ref1kg_population = NULL,
  ref1kg_genotypes = NULL, ref1kg_legend = NULL, ref1kg_fasta = NULL,
  bin_path = list(bcftools = "/usr/bin/bcftools", bgzip =
  "/usr/bin/bgzip", plink = "/usr/bin/plink1.9", gcta = "/usr/bin/gcta64"),
  title = paste(array, "Array Quality-Control"), author_name = "CARoT",
  author_affiliation = NULL, author_email = NULL, cache = FALSE,
  show_code = FALSE, n_cores = 1, dpi = 120, gg_fontsize = 12,
  encoding = "UTF-8", ...)

Arguments

input_directory

A character. The path to the plink files. The path should contains the file name without the extension, i.e., without *.bed, *.bim or *.fam.

output_directory

A character. The path to the output directory.

cohort_name

A character. The name of the studied cohort / population.

output_file

A character. The name of the html file produced.

array

A character. The array name, e.g., "Illumina Omni2.5".

callrate_samples

A numeric. The call rate threshold for samples, under which samples are excluded. Default is 0.95.

callrate_snps

A numeric. The call rate threshold for probes, under which probes are excluded. Default is 0.95.

heterozygosity_treshold

A numeric. The heterozygosity threshold for samples (number of standard deviation from the mean), under/above which samples are excluded. Default is 4.

maf_threshold

A numeric. The minor allele frequency under which variants are considered "rare". Default is 0.01.

hwe_pvalue

A numeric. The p-value threshold for Hardy-Weinberg equilibrium test. Default is 0.0001.

includes_relatives

A logical. Does the data contain related samples? Default is FALSE.

mendelian_samples

A numeric. The Mendel error rate threshold above which samples are excluded. Default is 0.05.

mendelian_snp

A numeric. The Mendel error rate threshold above which variants are excluded. Default is 0.1.

IBD_threshold

A numeric. The threshold for IBD (identical by descent) above which samples are characterised as relatives. Default is 0.2.

population

A character. The ethnicity of the studied population if known, e.g., "EUR". Default is NULL.

pca_components

A numeric. The number of principal components to be computed. Default is 10.

pca_threshold

A numeric. The threshold to define outliers on the principal component analysis, the as number of standard deviation from the cohort centroid. Default is 3.

check_bim_script

A character. The PERL script to use to check PLINK files to allow later imputation. Default is system.file("perl", "HRC-1000G-check-bim.pl", package = "CARoT").

ref1kg_panel

A character. The *.panel file from 1,000 Genome project. Default is NULL.

ref1kg_population

A character. The *.tsv file from 1,000 Genome project describing samples and ethnicity. Default is NULL.

ref1kg_genotypes

A character. The PLINK files from 1,000 Genome project. Default is NULL.

ref1kg_legend

A character. The *.legend file from 1,000 Genome project. Default is NULL.

ref1kg_fasta

A character. The *.fasta file from 1,000 Genome project. Default is NULL.

bin_path

A list(character). A list giving the binary path of bcftools, bgzip, gcta and plink1.9.

title

A character. The report's title. Default is paste(array, "Array Quality-Control").

author_name

A character. The author's name to be printed in the report. Default is CARoT.

author_affiliation

A character. The affiliation to be printed in the report. Default is NULL.

author_email

A character. The email to be printed in the report. Default is NULL.

cache

A logical. Should the R code be cached? Default is FALSE.

show_code

A logical. Should the R code be printed in the report? Default is FALSE.

n_cores

A numeric. The number of CPUs to use to estimate the ethnicity. Default is 1.

dpi

A numeric. The value for dpi when plotting the data. Default is 120.

gg_fontsize

A numeric. Value for the font size. Default is 12.

encoding

A character. The encoding to be used for the html report. Default is "UTF-8".

...

Parameters to pass to rmarkdown::render().