RNA-seq data
The seeker
package is designed to be a wrapper around
various command-line and R-based tools. The main function is, well,
seeker()
, which is targeted at processing bulk RNA-seq
data. seeker()
’s main argument is a list of parameters
specifying which steps of RNA-seq data processing to perform and how to
perform them. The list of parameters can come from a yaml file, an
example of which is shown below.
study: 'PRJNA600892' # [string]
metadata:
run: TRUE # [logical]
bioproject: 'PRJNA600892' # [string]
include:
# [named list or NULL]
colname: 'run_accession' # [string]
values: ['SRR10876945', 'SRR10876946'] # [vector]
# exclude # [named list or NULL]
# colname # [string]
# values # [vector]
fetch:
run: TRUE # [logical]
# keep # [logical or NULL]
# overwrite # [logical or NULL]
# keepSra # [logical or NULL]
# prefetchCmd # [string or NULL]
# prefetchArgs # [character vector or NULL]
# fasterqdumpCmd # [string or NULL]
# fasterqdumpArgs # [character vector or NULL]
# pigzCmd # [string or NULL]
# pigzArgs # [character vector or NULL]
trimgalore:
run: TRUE # [logical]
# keep # [logical or NULL]
# cmd # [string or NULL]
# args # [character vector or NULL]
# pigzCmd # [string or NULL]
fastqc:
run: TRUE # [logical]
# keep # [logical or NULL]
# cmd # [string or NULL]
# args # [character vector or NULL]
salmon:
run: TRUE # [logical]
indexDir: '~/refgenie_genomes/alias/mm10/salmon_partial_sa_index/default' # [string]
# sampleColname # [string or NULL]
# keep # [logical or NULL]
# cmd # [string or NULL]
# args # [character vector or NULL]
multiqc:
run: TRUE # [logical]
# cmd # [string or NULL]
# args # [character vector or NULL]
tximport:
run: TRUE # [logical]
tx2gene:
# [named list or NULL]
organism: 'mmusculus' # [string]
# version # [number or NULL]
# filename # [string or NULL]
countsFromAbundance: 'lengthScaledTPM' # [string]
# ignoreTxVersion # [logical or NULL]
An empty template yaml file is available at
system.file('extdata', 'params_template.yml', package = 'seeker')
.
You can copy these yaml files to your working directory like so:
for (filename in c('PRJNA600892.yml', 'params_template.yml')) {
file.copy(system.file('extdata', filename, package = 'seeker'), '.')}
If you’ve already installed the system dependencies, such as with
installSysDeps()
, a basic way to run seeker()
is then:
library('seeker')
doParallel::registerDoParallel()
yamlPath = 'PRJNA600892.yml'
params = yaml::read_yaml(yamlPath)
seeker(params)
Beware even this minimal example could take some time.
Microarray data
Here you can use the seekerArray()
function, which can
process data from NCBI
GEO and ArrayExpress, and can
process raw Affymetrix data stored locally. The main arguments are
study
and geneIdType
. For example:
library('seeker')
study = 'GSE25585'
geneIdType = 'entrez'
seekerArray(study, geneIdType)