This function selectively performs various steps to process RNA-seq data.
See also the vignettes: browseVignettes('seeker').
Arguments
- params
Named list of parameters with components:
study: String used to name the output directory withinparentDir.metadata: Named list with components:run: Logical indicating whether to fetch metadata. SeefetchMetadata(). IfTRUE, saves a fileparentDir/study/metadata.csv. IfFALSE, expects that file to already exist. The unmodified fetched or found metadata is saved to a fileparentDir/study/metadata_original.csv. Following components are only checked ifrunisTRUE.bioproject: String indicating the study's bioproject accession.include: Optional named list for specifying which rows of metadata to include for further processing, with components:colname: String indicating column in metadatavalues: Vector indicating values withincolname
exclude: Optional named list for specifying which rows of metadata to exclude from further processing (supersedinginclude), with components:colname: String indicating column in metadatavalues: Vector indicating values withincolname
fetch: Named list with components:run: Logical indicating whether to fetch files from SRA. Seefetch(). IfTRUE, saves files toparentDir/study/fetch_output. WhetherTRUEorFALSE, expects metadata to have a column "run_accession", and updates metadata with column "fastq_fetched" containing paths to files inparentDir/study/fetch_output. Following components are only checked ifrunisTRUE.keep: Logical indicating whether to keep fastq.gz files when all processing steps have completed.NULLindicatesTRUE.overwrite: Logical indicating whether to overwrite files that already exist.NULLindicates to use the default infetch().keepSra: Logical indicating whether to keep the ".sra" files.NULLindicates to use the default infetch().prefetchCmd: String indicating command for prefetch, which downloads ".sra" files.NULLindicates to use the default infetch().prefetchArgs: Character vector indicating arguments to pass to prefetch.NULLindicates to use the default infetch().fasterqdumpCmd: String indicating command for fasterq-dump, which uses ".sra" files to create ".fastq" files.NULLindicates to use the default infetch().prefetchArgs: Character vector indicating arguments to pass to fasterq-dump.NULLindicates to use the default infetch().pigzCmd: String indicating command for pigz, which converts ".fastq" files to ".fastq.gz" files.NULLindicates to use the default infetch().pigzArgs: Character vector indicating arguments to pass to pigz.NULLindicates to use the default infetch().
trimgalore: Named list with components:run: Logical indicating whether to perform quality/adapter trimming of reads. Seetrimgalore(). IfTRUE, expects metadata to have a column "fastq_fetched" containing paths to fastq files inparentDir/study/fetch_output, saves trimmed files toparentDir/study/trimgalore_output, and updates metadata with column "fastq_trimmed". IfFALSE, expects and does nothing. Following components are only checked ifrunisTRUE.keep: Logical indicating whether to keep trimmed fastq files when all processing steps have completed.NULLindicatesTRUE.cmd: Name or path of the command-line interface.NULLindicates to use the default intrimgalore().args: Additional arguments to pass to the command-line interface.NULLindicates to use the default intrimgalore().pigzCmd: String indicating command for pigz, which converts ".fastq" files to ".fastq.gz" files.NULLindicates to use the default intrimgalore().
fastqc: Named list with components:run: Logical indicating whether to perform QC on reads. Seefastqc(). IfTRUEandtrimgalore$runisTRUE, expects metadata to have a column "fastq_trimmed" containing paths to fastq files inparentDir/study/trimgalore_output. IfTRUEandtrimgalore$runisFALSE, expects metadata to have a column "fastq_fetched" containing paths to fastq files inparentDir/study/fetch_output. IfTRUE, saves results toparentDir/study/fastqc_output. IfFALSE, expects and does nothing. Following components are only checked ifrunisTRUE.keep: Logical indicating whether to keep fastqc files when all processing steps have completed.NULLindicatesTRUE.cmd: Name or path of the command-line interface.NULLindicates to use the default infastqc().args: Additional arguments to pass to the command-line interface.NULLindicates to use the default infastqc().
salmon: Named list with components:run: Logical indicating whether to quantify transcript abundances. Seesalmon(). IfTRUEandtrimgalore$runisTRUE, expects metadata to have a column "fastq_trimmed" containing paths to fastq files inparentDir/study/trimgalore_output. IfTRUEandtrimgalore$runisFALSE, expects metadata to have a column "fastq_fetched" containing paths to fastq files inparentDir/study/fetch_output. IfTRUE, saves results toparentDir/study/salmon_output andparentDir/study/salmon_meta_info.csv. IfFALSE, expects and does nothing. Following components are only checked ifrunisTRUE.indexDir: Directory that contains salmon index.sampleColname: String indicating column in metadata containing sample ids.NULLindicates "sample_accession", which should work for data from SRA and ENA.keep: Logical indicating whether to keep quantification results when all processing steps have completed.NULLindicatesTRUE.cmd: Name or path of the command-line interface.NULLindicates to use the default insalmon().args: Additional arguments to pass to the command-line interface.NULLindicates to use the default insalmon().
multiqc: Named list with components:run: Logical indicating whether to aggregrate results of various processing steps. Seemultiqc(). IfTRUE, saves results toparentDir/study/multiqc_output. IfFALSE, expects and does nothing. Following components are only checked ifrunisTRUE.cmd: Name or path of the command-line interface.NULLindicates to use the default inmultiqc().args: Additional arguments to pass to the command-line interface.NULLindicates to use the default inmultiqc().
tximport: Named list with components:run: Logical indicating whether to summarize transcript- or gene-level estimates for downstream analysis. Seetximport(). IfTRUE, expects metadata to have a columnsampleColnameof sample ids, and expects a directoryparentDir/study/salmon_output containing directories of quantification results, and saves results toparentDir/study/tximport_output.qs. IfFALSE, expects and does nothing. Following components are only checked ifrunisTRUE.tx2gene: Optional named list with components:organism: String indicating organism and thereby ensembl gene dataset. SeegetTx2gene().version: Optional number indicating ensembl version.NULLindicates the latest version. SeegetTx2gene().filename: Optional string indicating name of pre-existing text file inparentDir/params$studycontaining mapping between transcripts (first column) and genes (second column), with column names in the first row. Iffilenameis specified,organismandversionmust not be specified.
If not
NULL, saves a fileparentDir/study/tx2gene.csv.gz.countsFromAbundance: String indicating whether or how to estimate counts using estimated abundances. Seetximport::tximport().ignoreTxVersion: Logical indicating whether to the version suffix on transcript ids.NULLindicates to useTRUE. Seetximport::tximport().
paramscan be derived from a yaml file, seevignette("introduction", package = "seeker"). The yaml representation ofparamswill be saved toparentDir/params$study/params.yml.- parentDir
Directory in which to store the output, which will be a directory named according to
params$study.- dryRun
Logical indicating whether to check the validity of inputs without actually fetching or processing any data.
Examples
if (FALSE) { # \dontrun{
doParallel::registerDoParallel()
params = yaml::read_yaml('my_params.yaml')
seeker(params)
} # }