This function uses the NCBI SRA Toolkit via system2()
to download files
from SRA and convert them to fastq.gz. To process files in parallel, register
a parallel backend, e.g., using doParallel::registerDoParallel()
. Beware
that intermediate files created by fasterq-dump are uncompressed and could
require hundreds of gigabytes if files are processed in parallel.
Usage
fetch(
accessions,
outputDir,
overwrite = FALSE,
keepSra = FALSE,
prefetchCmd = "prefetch",
prefetchArgs = NULL,
fasterqdumpCmd = "fasterq-dump",
fasterqdumpArgs = NULL,
pigzCmd = "pigz",
pigzArgs = NULL
)
Arguments
- accessions
Character vector of SRA run accessions.
- outputDir
String indicating the local directory in which to save the files. Will be created if it doesn't exist.
- overwrite
Logical indicating whether to overwrite files that already exist in
outputDir
.- keepSra
Logical indicating whether to keep the ".sra" files.
- prefetchCmd
String indicating command for prefetch, which downloads ".sra" files.
- prefetchArgs
Character vector indicating arguments to pass to prefetch.
- fasterqdumpCmd
String indicating command for fasterq-dump, which uses ".sra" files to create ".fastq" files.
- fasterqdumpArgs
Character vector indicating arguments to pass to fasterq-dump.
- pigzCmd
String indicating command for pigz, which converts ".fastq" files to ".fastq.gz" files.
- pigzArgs
Character vector indicating arguments to pass to pigz.
Value
A list. As the function runs, it updates a tab-delimited log file in
outputDir
called "progress.tsv".