ksw2s aligner (ksw2_score)


Copyright (c) 2021, Huxelerate S.r.l. All rights reserved.

Provides a set of APIs to interact with the HUGenomic ksw2_score core that performs accelerated sequence alignment on FPGA.

The core functionality is based on the ksw2_extd method from the ksw2 library: https://github.com/lh3/ksw2

The core computes the optimal scores for global and extension alignment of two sequences using dual gap affine cost function and a custom scoring matrix. The core also supports banded alignment and the z-drop heuristics: https://doi.org/10.1093/bioinformatics/bty191

Before running the mothods of this library, you must ensure that the target FPGA is properly configured with the ksw2_score core through the fpga_configure function.

namespace hugenomic


std::vector<ksw2s_response> ksw2s_batch(uint32_t fpga_id, const std::vector<std::pair<std::vector<uint8_t>, std::vector<uint8_t>>> &pairs_of_sequences, int8_t start_gap_cost, int8_t extend_gap_cost, int8_t start_gap_cost_2, int8_t extend_gap_cost_2, const std::vector<int8_t> scoring_matrix, int32_t bandwidth, int32_t zdrop)

Performs a sequence alignment of one or more pairs of sequences on a target FPGA.

The ksw2_score core computes at once the score of 4 different types of alignments. These alignments are: global alignment, global query alignment, global target alignment, extension alignment. For each alignment type, each pair of query and target sequences are aligned from their beginning up to an end position (query, target) that depends on the alignment type as follows:

  • global alignment (query_length - 1, target_length - 1)

  • global query alignment (query_length - 1, FREE)

  • global target alignment (FREE, target_length - 1)

  • extension alignment (FREE, FREE)

Where FREE means that the algorithm selects the position that maximizes the alignment score.

The cost function for gaps is computed as the minimum of two affine gap functions using the following formula, where l is the length of the gap: min{start_gap_cost + l * extend_gap_cost, start_gap_cost_2 + l * extend_gap_cost_2}

The cost function associated to a match/mismatch is provided by a scoring matrix. The scoring matrix is an M*M matrix encoded as a linear array that determines the score to assign when comparing an element q of the query to an element t of the target sequence. In details, the score is assigned as:

score = scoring_matrix[t * M + q]

The value M must be in the range [2, KSW2S_MAX_SCORING_MATRIX_SIZE].

Notice that the values stored in the query and target sequences must be between 0 and M-1 otherwise an std::invalid_argument will be thrown.

Furthermore each sequence cannot be longer than KSW2S_MAX_SEQUENCE_SIZE.

PERFORMANCE TIP: you should always try to maximize the number of pairs of sequences in order to obtain maximum performance.


Vector of ksw2s_response containing the alignment result for all pairs of sequences

  • fpga_id: The identifier of the target FPGA to use

  • pairs_of_sequences: The pairs of query and target sequences to align where the first element of each pair is a query sequence and the second element of each pair is a target sequence

  • start_gap_cost: The cost for starting a gap

  • extend_gap_cost: The cost for extending a gap

  • start_gap_cost_2: The cost for starting a gap (second gap function)

  • extend_gap_cost_2: The cost for extending a gap (second gap function)

  • scoring_matrix: Scoring matrix of size M*M encoded as a linear vector

  • bandwidth: The width of the band for banded alignment, use a negative number for full alignment

  • zdrop: The z value for the z-drop heuristic, use a negative number to avoid z-drop



const uint64_t KSW2S_MAX_SEQUENCE_SIZE = 65536

The maximum supported sequence size

const uint64_t KSW2S_MAX_SCORING_MATRIX_SIZE = 64

The maximum supported size for the custom scoring matrix

const int32_t KSW2S_MAX_NOT_FOUND = -0x40000000

A special value assigned to a max score returned by the aligner when the max score is not found (such as when the alignment is z-dropped)

struct ksw2s_response
#include <ksw2_score.hpp>

A structure that defines the result of the execution of a ksw2 score alignment on two pair of sequences

Public Members

int32_t extension_max_score

Max score of extension alignment:

  • query from 0 to [X]

  • target from 0 to [Y]

can be equal to KSW2S_MAX_NOT_FOUND if not found

int32_t extension_query_end_position

Query end position ([X]) for extension alignment

int32_t extension_target_end_position

Target end position ([Y]) for extension alignment

int32_t global_query_max_score

Max score of global query alignment:

  • query from 0 to query_len - 1

  • target from 0 to [Y]

can be equal to KSW2S_MAX_NOT_FOUND if not found

int32_t global_query_target_end_position

Target end position ([Y]) for global query alignment

int32_t global_target_max_score

Max score of global target alignment:

  • query from 0 to [X]

  • target from 0 to target_len - 1

can be equal to KSW2S_MAX_NOT_FOUND if not found

int32_t global_target_query_end_position

Query end position ([X]) for global target alignment

int32_t global_max_score

Max score of global alignment:

  • query from 0 to query_len - 1

  • target from 0 to target_len - 1

can be equal to KSW2S_MAX_NOT_FOUND if not found

bool zdropped

Whether the alignment has been z-dropped

Usage examples


 * Copyright (c) 2021, Huxelerate S.r.l.
 * All rights reserved.
 * Sample code for computing the average score and length of an extension alignment 
 * between random DNA reads.

#include <iostream>
#include <iomanip>
#include <string>
#include <sys/time.h>
#include <vector>
#include <utility>
#include "hugenomic.hpp"    // HUGenomic library

using namespace hugenomic;

// number of pairs of sequences to generate and their lengths
const unsigned int NUM_PAIRS = 48;
const size_t QUERY_SEQUENCES_LEN = 4000;
const size_t TARGET_SEQUENCES_LEN = 4000;

// scoring system used by the sequence alignment algorithm
const int START_GAP_COST = 4;
const int EXTEND_GAP_COST = 2;
const int START_GAP_COST_2 = 13;
const int EXTEND_GAP_COST_2 = 1;

// in this example we use 5x5 scoring matrix, where we assume the following 
// encoding for positions (rows/columns) within the matrix:
// 0 -> A,   1 -> C,   2 -> G,   3 -> T,   4 -> * 
std::vector<int8_t> SCORING_MATRIX = {
     3, -4, -4, -4,  0,
    -4,  3, -4, -4,  0,
    -4, -4,  3, -4,  0,
    -4, -4, -4,  3,  0,
     0,  0,  0,  0,  0

const int ZDROP = -1;       // we do not use the z-drop heuristics
const int BANDWIDTH = -1;   // we do full alignment (not banded) 

// utility function to generate a random dna sequence with a given length
std::vector<uint8_t> random_dna_sequence(size_t length);

// utility function to retrieve the current time in microseconds
unsigned long get_time();

int main()
    // initialize one FPGA with the accelerated alignment core
    const int fpga_id = 0;
    fpga_configure(ksw2_score, fpga_id);

    // generate a random set of pairs of DNA reads 
    std::cout << "Generating " << NUM_PAIRS 
              << " random pairs of DNA reads, each pair with read lengths ("
              << QUERY_SEQUENCES_LEN << ", " << TARGET_SEQUENCES_LEN << ")" 
              << std::endl;

    std::vector< std::pair< std::vector<uint8_t>, std::vector<uint8_t> > > sequences_pairs;

    for(int s = 0; s < NUM_PAIRS; s++) {
        std::vector<uint8_t> query_sequence = random_dna_sequence(QUERY_SEQUENCES_LEN);
        std::vector<uint8_t> target_sequence = random_dna_sequence(TARGET_SEQUENCES_LEN);
        sequences_pairs.push_back({query_sequence, target_sequence});
    std::cout << "Generation complete!" << std::endl << std::endl;

    // run the alignment on each pair of random reads
    std::cout << "Running alignment..." << std::endl;

    int total_score = 0;
    int total_length = 0;

    unsigned long start_compute_time = get_time();

    std::vector<ksw2s_response> alignment_results = ksw2s_batch(fpga_id, sequences_pairs,

    for(int s = 0; s < NUM_PAIRS; s++) {
        total_score += 
        total_length +=
            alignment_results[s].extension_query_end_position + 1 + 
            alignment_results[s].extension_target_end_position + 1;

    unsigned long end_compute_time = get_time();

    // print the average score for a random overlap alignment
    std::cout << std::fixed << std::setprecision(2);
    std::cout << "Alignment complete!" << std::endl << std::endl;
    std::cout << "Average extension alignment score = " 
              << total_score / (double) NUM_PAIRS << std::endl << std::endl;
    std::cout << "Average alignment length = " 
              << total_length / (double) NUM_PAIRS << std::endl << std::endl;

    // measure execution time and performance
    double time_ms = (end_compute_time - start_compute_time) / 1000.0;
    std::cout << "Alignment time: " << time_ms << " ms" << std::endl
              << "GCUPS (Giga Cell Update Per Second) =  " 
              << (cell_updates / time_ms / 1000000.0) << std::endl;

    // release the FPGA 
    return 0;

std::vector<uint8_t> random_dna_sequence(size_t length)
    std::vector<uint8_t> sequence;
    for(size_t i = 0; i < length; i++) {
        sequence.push_back( rand() % 5 );

    return sequence;

unsigned long get_time()
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec * 1000 * 1000 + tv.tv_usec;