eCLIP Workflow

The eCLIP Workflow implemented here was designed to run on a High-performance Cluster such as Biowulf or FRCE. The heart of the workflow analysis uses the eCLIP workflow from Yeo Lab's.

Detailed information on required software can be found using the following links:

  1. Slurm workload Manager
  2. Environmental Modules
  3. SingularityCE
  4. git
  5. eCLIP from Yeo Lab.
  6. MultiQC from Seqera.

The first four items are typically provided by a High-performance Cluster such as Biowulf or FRCE.

Setting Up eCLIP Workflow

Create an Environment to Run eCLIP Workflow on FRCE

module load mamba

mamba create -n cwl_env conda-forge::tabulate conda-forge::cwltool matplotlib

mamba activate cwl_env
pip install multiqc

multiqc --version

Download Yeo Lab's GitHub Repository

SCRIPT_DIR=/home/$USER/eCLIP_WF
mkdir $SCRIPT_DIR

cd $SCRIPT_DIR
git clone git@github.com:YeoLab/eCLIP.git

ECLIP_DIR=$SCRIPT_DIR/eCLIP
ls -l $ECLIP_DIR

Download custom scripts to run the pipeline on biowulf / FRCE

  1. env.yml
  2. post_process_eCLIP.py
  3. run.sh
  4. summarize_merge_peaks_wf.py

Setting Up Required Files and Directories

Directory locations should be based on the preferences of the user. As an example, we provide the following set up below as set up on FRCE. It is recommend that you separate the results or runs directory from the data directory. A full description of required files is located with links below:

DATA_DIR=/home/$USER/Data
RUN_DIR=/scratch/cluster_scratch/$USER/eCLIP_run

mkdir $DATA_DIR
mkdir $RUN_DIR

Create a manifest directory and manifest file as required by eCLIP.

MANIFEST_DIR=$DATA_DIR/manifests
mkdir $MANIFEST_DIR

# For single-end reads copy, modify and rename file
cp $ECLIP_DIR/example/single_end_clip.yaml $MANIFEST_DIR/

Modify the Run Script

Modify the run.sh script file.

Running Workflow

The workflow has two major parts:

  1. Run YeoLab's eCLIP.
  2. Followed by YeoLab's merge_peaks
sbatch run.sh

Results

A full description of results files are located with eCLIP Outputs and merge_peaks Outputs.

We provide two summary html files are generated:

  1. eCLIP_ReadMeSummary.html summary results file as a starting point for exploration of your results of eCLIP, and
  2. eCLIP_MergePeaks_ReadMeSummary.html for merge_peaks.