This function selectively performs various steps to process RNA-seq data.
See also the vignettes: browseVignettes('seeker')
.
Arguments
- params
Named list of parameters with components:
study
: String used to name the output directory withinparentDir
.metadata
: Named list with components:run
: Logical indicating whether to fetch metadata. SeefetchMetadata()
. IfTRUE
, saves a fileparentDir
/study
/metadata.csv. IfFALSE
, expects that file to already exist. The unmodified fetched or found metadata is saved to a fileparentDir
/study
/metadata_original.csv. Following components are only checked ifrun
isTRUE
.bioproject
: String indicating the study's bioproject accession.include
: Optional named list for specifying which rows of metadata to include for further processing, with components:colname
: String indicating column in metadatavalues
: Vector indicating values withincolname
exclude
: Optional named list for specifying which rows of metadata to exclude from further processing (supersedinginclude
), with components:colname
: String indicating column in metadatavalues
: Vector indicating values withincolname
fetch
: Named list with components:run
: Logical indicating whether to fetch files from SRA. Seefetch()
. IfTRUE
, saves files toparentDir
/study
/fetch_output. WhetherTRUE
orFALSE
, expects metadata to have a column "run_accession", and updates metadata with column "fastq_fetched" containing paths to files inparentDir
/study
/fetch_output. Following components are only checked ifrun
isTRUE
.keep
: Logical indicating whether to keep fastq.gz files when all processing steps have completed.NULL
indicatesTRUE
.overwrite
: Logical indicating whether to overwrite files that already exist.NULL
indicates to use the default infetch()
.keepSra
: Logical indicating whether to keep the ".sra" files.NULL
indicates to use the default infetch()
.prefetchCmd
: String indicating command for prefetch, which downloads ".sra" files.NULL
indicates to use the default infetch()
.prefetchArgs
: Character vector indicating arguments to pass to prefetch.NULL
indicates to use the default infetch()
.fasterqdumpCmd
: String indicating command for fasterq-dump, which uses ".sra" files to create ".fastq" files.NULL
indicates to use the default infetch()
.prefetchArgs
: Character vector indicating arguments to pass to fasterq-dump.NULL
indicates to use the default infetch()
.pigzCmd
: String indicating command for pigz, which converts ".fastq" files to ".fastq.gz" files.NULL
indicates to use the default infetch()
.pigzArgs
: Character vector indicating arguments to pass to pigz.NULL
indicates to use the default infetch()
.
trimgalore
: Named list with components:run
: Logical indicating whether to perform quality/adapter trimming of reads. Seetrimgalore()
. IfTRUE
, expects metadata to have a column "fastq_fetched" containing paths to fastq files inparentDir
/study
/fetch_output, saves trimmed files toparentDir
/study
/trimgalore_output, and updates metadata with column "fastq_trimmed". IfFALSE
, expects and does nothing. Following components are only checked ifrun
isTRUE
.keep
: Logical indicating whether to keep trimmed fastq files when all processing steps have completed.NULL
indicatesTRUE
.cmd
: Name or path of the command-line interface.NULL
indicates to use the default intrimgalore()
.args
: Additional arguments to pass to the command-line interface.NULL
indicates to use the default intrimgalore()
.pigzCmd
: String indicating command for pigz, which converts ".fastq" files to ".fastq.gz" files.NULL
indicates to use the default intrimgalore()
.
fastqc
: Named list with components:run
: Logical indicating whether to perform QC on reads. Seefastqc()
. IfTRUE
andtrimgalore$run
isTRUE
, expects metadata to have a column "fastq_trimmed" containing paths to fastq files inparentDir
/study
/trimgalore_output. IfTRUE
andtrimgalore$run
isFALSE
, expects metadata to have a column "fastq_fetched" containing paths to fastq files inparentDir
/study
/fetch_output. IfTRUE
, saves results toparentDir
/study
/fastqc_output. IfFALSE
, expects and does nothing. Following components are only checked ifrun
isTRUE
.keep
: Logical indicating whether to keep fastqc files when all processing steps have completed.NULL
indicatesTRUE
.cmd
: Name or path of the command-line interface.NULL
indicates to use the default infastqc()
.args
: Additional arguments to pass to the command-line interface.NULL
indicates to use the default infastqc()
.
salmon
: Named list with components:run
: Logical indicating whether to quantify transcript abundances. Seesalmon()
. IfTRUE
andtrimgalore$run
isTRUE
, expects metadata to have a column "fastq_trimmed" containing paths to fastq files inparentDir
/study
/trimgalore_output. IfTRUE
andtrimgalore$run
isFALSE
, expects metadata to have a column "fastq_fetched" containing paths to fastq files inparentDir
/study
/fetch_output. IfTRUE
, saves results toparentDir
/study
/salmon_output andparentDir
/study
/salmon_meta_info.csv. IfFALSE
, expects and does nothing. Following components are only checked ifrun
isTRUE
.indexDir
: Directory that contains salmon index.sampleColname
: String indicating column in metadata containing sample ids.NULL
indicates "sample_accession", which should work for data from SRA and ENA.keep
: Logical indicating whether to keep quantification results when all processing steps have completed.NULL
indicatesTRUE
.cmd
: Name or path of the command-line interface.NULL
indicates to use the default insalmon()
.args
: Additional arguments to pass to the command-line interface.NULL
indicates to use the default insalmon()
.
multiqc
: Named list with components:run
: Logical indicating whether to aggregrate results of various processing steps. Seemultiqc()
. IfTRUE
, saves results toparentDir
/study
/multiqc_output. IfFALSE
, expects and does nothing. Following components are only checked ifrun
isTRUE
.cmd
: Name or path of the command-line interface.NULL
indicates to use the default inmultiqc()
.args
: Additional arguments to pass to the command-line interface.NULL
indicates to use the default inmultiqc()
.
tximport
: Named list with components:run
: Logical indicating whether to summarize transcript- or gene-level estimates for downstream analysis. Seetximport()
. IfTRUE
, expects metadata to have a columnsampleColname
of sample ids, and expects a directoryparentDir
/study
/salmon_output containing directories of quantification results, and saves results toparentDir
/study
/tximport_output.qs. IfFALSE
, expects and does nothing. Following components are only checked ifrun
isTRUE
.tx2gene
: Optional named list with components:organism
: String indicating organism and thereby ensembl gene dataset. SeegetTx2gene()
.version
: Optional number indicating ensembl version.NULL
indicates the latest version. SeegetTx2gene()
.filename
: Optional string indicating name of pre-existing text file inparentDir
/params$study
containing mapping between transcripts (first column) and genes (second column), with column names in the first row. Iffilename
is specified,organism
andversion
must not be specified.
If not
NULL
, saves a fileparentDir
/study
/tx2gene.csv.gz.countsFromAbundance
: String indicating whether or how to estimate counts using estimated abundances. Seetximport::tximport()
.ignoreTxVersion
: Logical indicating whether to the version suffix on transcript ids.NULL
indicates to useTRUE
. Seetximport::tximport()
.
params
can be derived from a yaml file, seevignette("introduction", package = "seeker")
. The yaml representation ofparams
will be saved toparentDir
/params$study
/params.yml.- parentDir
Directory in which to store the output, which will be a directory named according to
params$study
.- dryRun
Logical indicating whether to check the validity of inputs without actually fetching or processing any data.
Examples
if (FALSE) { # \dontrun{
doParallel::registerDoParallel()
params = yaml::read_yaml('my_params.yaml')
seeker(params)
} # }