The SeqWare Pipeline


Overview

The SeqWare Pipeline sub-project is really the heart of the overall SeqWare project. This provides the core functionality of SeqWare; it is workflow developer environment and a series of tools for installing, running, and monitoring workflows.
We currently support two workflow languages (FTL markup and Java) and two workflow engines (Oozie and Pegasus). Our current recommended combination is Java workflows with the Pegasus engine.
We highly recommend you go through the UserDeveloper, and Admin tutorials since the documentation below assumes you already have.

Features

SeqWare Pipeline has several key features that distinguish it from other open source and private workflow solutions. These include:
  • tool-agnostic
  • developer framework focused
  • focused on automated analysis
  • includes cluster abstraction
  • supports detailed provenance tracking
  • supports user-created workflows
  • implements a self-contained workflow packaging standard
  • includes fault tolerance
  • focuses on meeting workflow needs of big projects (thousands of samples)
  • is open source

Building and Installing
  • Installation
    This is our installation guide based on VMs that we recommend for most users. You will be left with a functioning SeqWare install including SeqWare Pipeline.
  • Installation From Scratch
    This guide walks you through how we built the VMs and will be of interest to anyone that needs to see the details of SeqWare setup starting with an empty Linux server. It is complicated so we highly recommend using a VM (which can be connected to a real cluster).
  • Building from Source
    These directions show you how to build the whole project, including SeqWare Pipeline, using Maven.

Setup

  • User Settings
    Information about configuring user settings files.
  • Monitor Configuration
    Setting up the SeqWare-associated tools that need to run so workflow triggering and monitoring workflows.
  • Connecting to a Real Cluster
    Once you are happy with writing, installing, and running workflows on a stand-alone VM you will want to connect to a “real” cluster. This guide walks you through the process of connecting a VM to a cluster (HPC & Hadoop, depending on your workflow engine of choice).
    Read more