Signal aligner (dynamic_time_warping)¶
Documentation¶
Copyright (c) 2019, Huxelerate S.r.l. All rights reserved.
Provides a set of APIs to interact with the HUGenomic dynamic_time_warping core that performs accelerated signal alignment on FPGA.
The core performs an accelerated Dynamic Time Warping algorithm and supports both euclidean and squared euclidean distance functions.
Before running the mothods of this library, you must ensure that the target FPGA is properly configured with the dynamic_time_warping core through the fpga_configure function.
-
namespace
hugenomic
¶ Enums
Functions
-
std::vector<dtw_response>
dtw_batch
(uint32_t fpga_id, const std::vector<std::pair<std::vector<float>, std::vector<float>>> &pairs_of_sequences, dtw_distance_function distance_function, bool free_top, bool free_left, bool free_right, bool free_down)¶ Performs an optimal sequence alignment using the Dynamic Time Warping algorithm of one or more pairs of sequences on a target FPGA.
Sequences cannot be longer than DTW_MAX_SEQUENCE_SIZE and the smaller sequence of a pair cannot exceed DTW_MAX_SIZE_OF_SMALLER_SEQUENCE.
PERFORMANCE TIP: you should always try to maximize the number of pairs of sequences in order to obtain maximum performance.
- Return
Vector of dtw_response containing the alignment result for all pairs of sequences
- Parameters
fpga_id
: The identifier of the target FPGA to usepairs_of_sequences
: The pairs of horizontal and vertical sequences to align where the first element of each pair is a horizontal sequence and the second element of each pair is a vertical sequencedistance_function
: The distance function to usefree_top
: Whether there is no penalty for skipping elements at the beginning of the vertical sequence (only for global alignment)free_left
: Whether there is no penalty for skipping elements at the beginning of the horizontal sequence (only for global alignment)free_right
: Whether there is no penalty for skipping elements at the end of the horizontal sequence (only for global alignment)free_down
: Whether there is no penalty for skipping elements at the end of the vertical sequence (only for global alignment)
- Exceptions
std::invalid_argument
: If a sequence is empty or a pair of sequences exceeds the size constraintsinvalid_fpga_configuration
: If the target FPGA is not configured with the dynamic_time_warping coreinvalid_fpga_identifier
: If an invalid FPGA identifier is specified
-
std::vector<dtw_response>
dtw_all_to_all
(uint32_t fpga_id, const std::vector<std::vector<float>> &horizontal_sequences, const std::vector<std::vector<float>> &vertical_sequences, dtw_distance_function distance_function, bool free_top, bool free_left, bool free_right, bool free_down)¶ Runs an all-to-all sequence alignment using the Dynamic Time Warping algorithm on a given FPGA.
The method receives as input two vectors of std::vector<float> containing the pairs of horizontal and vertical sequences to align against each other and returns a vector of dtw_response of size: (number of horizontal sequences) X (number of vertical_sequences) containing the alignment results for all the pairs of sequences.
Sequences cannot be longer than DTW_MAX_SEQUENCE_SIZE and the smaller sequence of a pair of horizontal and vertical sequences cannot exceed DTW_MAX_SIZE_OF_SMALLER_SEQUENCE.
PERFORMANCE TIP: Whenever possible try to maximize the number of alignments to perform in a single function call.
- Return
Vector of dtw_response containing all the alignment results stored in row-wise format. row 0: from result[0] to result[1 * horizontal_sequences.size() - 1] row 1: from result[1 * horizontal_sequences.size()] to result[2 * horizontal_sequences.size() - 1] row 2: from result[2 * horizontal_sequences.size()] to result[3 * horizontal_sequences.size() - 1] …
- Parameters
fpga_id
: The identifier of the target FPGAhorizontal_sequences
: Vector containing all the horizontal sequencesvertical_sequences
: Vector containing all the vertical sequencesdistance_function
: Whether to use euclidean or squared euclidean as distance functionfree_top
: Whether there is no penalty for skipping characters at the beginning of the vertical sequencefree_left
: Whether there is no penalty for skipping characters at the beginning of the horizontal sequencefree_right
: Whether there is no penalty for skipping characters at the end of the horizontal sequencefree_down
: Whether there is no penalty for skipping characters at the end of the vertical sequence
- Exceptions
std::invalid_argument
: If there are no horizontal or vertical sequences or if there is a pair of vertical and horizontal sequences which are either empty or exceed size constraints (see above)invalid_fpga_configuration
: If the target FPGA is not configured with the dynamic_time_warping coreinvalid_fpga_identifier
: If an invalid FPGA identifier is specifiedstd::logic_error
: If there are pending requests enqueued to any of the worker of the selected FPGA
Variables
-
const uint64_t
DTW_MAX_SIZE_OF_SMALLER_SEQUENCE
= 65535¶ The maximum supported size of the smaller among a pair of horizontal and vertical sequences
-
const uint64_t
DTW_MAX_SEQUENCE_SIZE
= 178891323¶ The maximum supported size of a sequence
-
struct
dtw_response
¶ - #include <dynamic_time_warping.hpp>
A structure that defines the result of the execution of a Dynamic Time Warping on two pair of signals.
Public Members
-
float
alignment_distance
¶ the optimal distance score between the two signals
-
uint32_t
vertical_sequence_start_position
¶ The start position of the vertical sequence in the optimal alignment
-
uint32_t
vertical_sequence_end_position
¶ The end position of the vertical sequence in the optimal alignment
-
uint32_t
horizontal_sequence_start_position
¶ The start position of the horizontal sequence in the optimal alignment
-
uint32_t
horizontal_sequence_end_position
¶ The end position of the horizontal sequence in the optimal alignment
-
float
-
std::vector<dtw_response>
Usage examples¶
examples/dynamic_time_warping/1_random_time_series_alignment¶
/**
* Copyright (c) 2019, Huxelerate S.r.l.
* All rights reserved.
*
* Sample code for computing the average alignment distance of two random series
* using the dynamic time warping algorithm.
*/
#include <iostream>
#include <iomanip>
#include <sstream>
#include <sys/time.h>
#include <vector>
#include "hugenomic.hpp" // HUGenomic library
using namespace hugenomic;
// number of pairs of horizontal/vertical series to generate and their lengths (number of samples)
const unsigned int NUM_PAIRS = 128;
const unsigned int H_SERIES_NUM_SAMPLES = 2000;
const unsigned int V_SERIES_NUM_SAMPLES = 2000;
// the distance function to use for the dynamic time warping algorithm
const dtw_distance_function DISTANCE_FUNCTION = dtw_euclidean;
// specifies a global alignment: the algorithm should align all the samples from the "horizontal"
// series to all the samples of the "vertical" series, meaning that we cannot skip the alignment
// of samples at the beginning or at the end of the series.
const bool FREE_TOP = false;
const bool FREE_LEFT = false;
const bool FREE_RIGHT = false;
const bool FREE_DOWN = false;
// utility function to retrieve the current time in microseconds
unsigned long get_time();
// utility function to generate a random series of 'num_samples' between 0.0 and 1.0
std::vector<float> generate_random_series(unsigned int num_samples);
int main()
{
// initializes one FPGA with the accelerated dynamic time warping core
const int fpga_id = 0;
fpga_configure(dynamic_time_warping, fpga_id);
std::vector<std::vector<float> > h_series;
std::vector<std::vector<float> > v_series;
std::cout << "Generating " << NUM_PAIRS
<< " random pairs of time series, each pair with number of samples ("
<< H_SERIES_NUM_SAMPLES << ", " << V_SERIES_NUM_SAMPLES << ")" << std::endl;
srand(0);
std::vector< std::pair< std::vector<float>, std::vector<float> > > sequences_pairs;
for(int s = 0; s < NUM_PAIRS; s++) {
std::vector<float> h_series = generate_random_series(H_SERIES_NUM_SAMPLES);
std::vector<float> v_series = generate_random_series(V_SERIES_NUM_SAMPLES);
sequences_pairs.push_back({h_series, v_series});
}
std::cout << "Generation complete!" << std::endl << std::endl;
// run the overlap alignment on each pair of random reads
std::cout << "Running dynamic time warping alignment on all pairs..." << std::endl;
float total_distance = 0.0f;
unsigned long start_compute_time = get_time();
std::vector<dtw_response> alignment_results = dtw_batch(fpga_id, sequences_pairs,
DISTANCE_FUNCTION, FREE_TOP, FREE_LEFT, FREE_RIGHT, FREE_DOWN);
for(int s = 0; s < NUM_PAIRS; s++) {
total_distance += alignment_results[s].alignment_distance;
}
unsigned long end_compute_time = get_time();
// shows the average score for a random overlap alignment
std::cout << std::fixed << std::setprecision(2);
std::cout << "Alignment complete!" << std::endl << std::endl;
std::cout << "Average alignment distance = "
<< total_distance / (float)NUM_PAIRS << std::endl << std::endl;
// measures execution time and performance
double time = end_compute_time - start_compute_time;
std::cout << "Alignment time: " << time / 1000.0 << " ms" << std::endl
<< "GCUPS (Giga Cell Update Per Second) = "
<< ((H_SERIES_NUM_SAMPLES * V_SERIES_NUM_SAMPLES * NUM_PAIRS) / time / 1000.0) << std::endl;
// release the FPGA
fpga_release(fpga_id);
return 0;
}
unsigned long get_time()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec * 1000 * 1000 + tv.tv_usec;
}
std::vector<float> generate_random_series(unsigned int num_samples)
{
std::vector<float> sequence;
for(int i = 0; i < num_samples; i++) {
sequence.push_back(((float) rand()) / RAND_MAX);
}
return sequence;
}
examples/dynamic_time_warping/2_signal_classifier¶
/**
* Copyright (c) 2019, Huxelerate S.r.l.
* All rights reserved.
*
* In this example application we implement a simple signal classifier.
* Given a set of reference signals (REF_SIGNALS) and a set of query signals (QUERY_SIGNALS)
* we want to assign each query signal to the reference signal that resembles it most.
*
* The query signals might contain a different number of samples than the reference signals
* (i.e. sampled at different sample rates) and might be noisy as well.
*
* In this code, we leverage on the dynamic time warping algorithm to compute the distance
* of a given query signal against all the reference signals to identify the proper reference
* signal.
*
* NOTE: this is an example application, the classifier can be improved in many ways, such as:
* - computing a threshold distance above which the signal should not be classified
* - taking into account cases in which the distance against different references are very similar
* - ...
*/
#include <iomanip>
#include <iostream>
#include <random>
#include <sstream>
#include <sys/time.h>
#include <vector>
#include "hugenomic.hpp" // HUGenomic library
using namespace hugenomic;
const unsigned int NUM_REF_SIGNALS = 22; // the number of reference signals
const unsigned int REF_SIGNAL_NUM_SAMPLES = 256; // the number of samples for each reference signal
const unsigned int NUM_QUERY_SIGNALS = 1000; // the number of signals to query
const float DELETION_PROBABILITY = 0.1f; // probability that a sample is deleted from the query
const float REPETITION_PROBABILITY = 0.3f; // probability that a sample is repeated in the query
// parameters of the gaussian noise to add to the query signals
const float GAUSSIAN_NOISE_MEAN = 0.0f;
const float GAUSSIAN_NOISE_STDDEV = 0.2f;
// dynamic time warping parameters
// NOTE: all free_* flags set to false to align sequences end-to-end
const dtw_distance_function DISTANCE_FUNCTION = dtw_euclidean;
const bool FREE_TOP = false;
const bool FREE_LEFT = false;
const bool FREE_RIGHT = false;
const bool FREE_DOWN = false;
// utility functions:
// retrieves the current time in microseconds
unsigned long get_time();
// generates a random signal of 'num_samples' between 0.0 and 1.0 given a random number generator 'rng'
std::vector<float> generate_random_signal(unsigned int num_samples, std::mt19937 &rng);
// applies a gaussian noise to a signal
std::vector<float> add_gaussian_noise(std::vector<float> signal, float mean, float stddev, std::mt19937 &rng);
// resamples a signal by deleting or repeating some of its samples with given probabilities
std::vector<float> resample_signal(std::vector<float> signal, float p_del, float p_rep, std::mt19937 &rng);
int main()
{
std::vector<std::vector<float> > reference_signals; // the set of reference signals
std::vector<std::vector<float> > query_signals; // the set of signals to query
std::vector<int> expected_classification; // the expected classification for each query
std::vector<int> final_classification; // the final classification for each query
std::vector<float> final_classification_distance; // the query-to-reference distance of the final classification
std::mt19937 rng; // random number generator
int correct_classifications; // number of correct classifications
float average_final_classification_distance; // the average query-to-reference final classification distance
long int total_query_size = 0; // total size of all queries
// initializes one FPGA with the accelerated dynamic time warping core
const int fpga_id = 0;
fpga_configure(dynamic_time_warping, fpga_id);
// 1) generate reference signals
std::cout << std::endl << "> Generating " << NUM_REF_SIGNALS << " reference signals each with "
<< REF_SIGNAL_NUM_SAMPLES << " samples" << std::endl;
for(int s = 0; s < NUM_REF_SIGNALS; s++) {
reference_signals.push_back(generate_random_signal(REF_SIGNAL_NUM_SAMPLES, rng));
}
// 2) generate query signals and populate the expected classification
std::cout << "> Generating " << NUM_QUERY_SIGNALS << " query signals" << std::endl;
for(int s = 0; s < NUM_QUERY_SIGNALS; s++) {
// pick a random reference
std::uniform_int_distribution<int> uniform(0, NUM_REF_SIGNALS - 1);
int reference_id = uniform(rng);
// apply gaussian noise and resampling
std::vector<float> query = reference_signals[reference_id];
query = add_gaussian_noise(query, GAUSSIAN_NOISE_MEAN, GAUSSIAN_NOISE_STDDEV, rng);
query = resample_signal(query, DELETION_PROBABILITY, REPETITION_PROBABILITY, rng);
total_query_size += query.size();
// add it as a query and store what is the expected classification
query_signals.push_back(query);
expected_classification.push_back(reference_id);
}
// 3) align each query signal against each reference signal to compute signal distances
std::cout << "> Running dynamic time warping alignment on all query-reference pairs" << std::endl;
unsigned long start_compute_time = get_time();
std::vector<dtw_response> alignments =
dtw_all_to_all(fpga_id, reference_signals, query_signals,
DISTANCE_FUNCTION, FREE_TOP, FREE_LEFT, FREE_RIGHT, FREE_DOWN);
unsigned long end_alignment_time = get_time();
// 4) for each query, search for the reference with smallest distance and classify accordingly
std::cout << "> Performing query classification" << std::endl;
for(int query_id = 0; query_id < NUM_QUERY_SIGNALS; query_id++) {
// start assuming that the first reference is the correct one
int classification_ref_id = 0;
float classification_distance = alignments[query_id * NUM_REF_SIGNALS].alignment_distance;
// search over all other references
for(int ref_id = 1; ref_id < NUM_REF_SIGNALS; ref_id++) {
float current_distance = alignments[query_id * NUM_REF_SIGNALS + ref_id].alignment_distance;
if(current_distance < classification_distance) {
// update classification for current query
classification_distance = current_distance;
classification_ref_id = ref_id;
}
}
// store classification result for the query and the classification distance
final_classification.push_back(classification_ref_id);
final_classification_distance.push_back(classification_distance);
}
// 5) compute how many query have correctly classified and the average distance
std::cout << "> Computing classification results" << std::endl;
correct_classifications = 0;
average_final_classification_distance = 0.0f;
for(int query_id = 0; query_id < NUM_QUERY_SIGNALS; query_id++) {
if(final_classification[query_id] == expected_classification[query_id]) {
correct_classifications++;
}
average_final_classification_distance += final_classification_distance[query_id];
}
average_final_classification_distance = average_final_classification_distance / NUM_QUERY_SIGNALS;
unsigned long end_classification_time = get_time();
// print result
std::cout << "> Classification complete!" << std::endl << std::endl
<< "correctly classified: " << correct_classifications << " / "
<< NUM_QUERY_SIGNALS << std::endl
<< "average classification distance: " << average_final_classification_distance
<< std::endl << std::endl;
// measures execution time and performance
double alignment_time_ms = (end_alignment_time - start_compute_time) / 1000.0;
double classification_time_ms = (end_classification_time - end_alignment_time) / 1000.0;
double total_compute_time_ms = alignment_time_ms + classification_time_ms;
std::cout << "Total time: " << total_compute_time_ms << " ms of which: " << std::endl
<< " - Alignment time: " << alignment_time_ms << " ms" << std::endl
<< " - Classification time: " << classification_time_ms << " ms" << std::endl
<< std::endl
<< "Alignment performance in GCUPS (Giga Cell Update Per Second) = "
<< ((total_query_size * NUM_REF_SIGNALS * REF_SIGNAL_NUM_SAMPLES) / alignment_time_ms / 1000000.0)
<< std::endl << std::endl;
// release the FPGA
fpga_release(fpga_id);
return 0;
}
unsigned long get_time()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec * 1000 * 1000 + tv.tv_usec;
}
std::vector<float> generate_random_signal(unsigned int num_samples, std::mt19937 &rng)
{
std::vector<float> signal;
std::uniform_real_distribution<float> real_uniform(0.0f, 1.0f);
for(int i = 0; i < num_samples; i++) {
signal.push_back(real_uniform(rng));
}
return signal;
}
std::vector<float> add_gaussian_noise(std::vector<float> signal, float mean, float stddev, std::mt19937 &rng)
{
std::vector<float> noisy_signal;
std::normal_distribution<float> gaussian_noise(mean, stddev);
for(int i = 0; i < signal.size(); i++) {
noisy_signal.push_back(signal[i] + gaussian_noise(rng));
}
return noisy_signal;
}
std::vector<float> resample_signal(std::vector<float> signal, float p_del, float p_rep, std::mt19937 &rng)
{
std::uniform_real_distribution<float> real_uniform(0.0f, 1.0f);
std::vector<float> sampled_signal;
for(int i = 0; i < signal.size(); i++) {
bool do_delete = real_uniform(rng) < p_del;
bool do_repeat = real_uniform(rng) < p_rep;
// if both deletion and repetition are true, they cancel each other
if(do_delete && do_repeat) {
do_delete = false;
do_repeat = false;
}
if(!do_delete) {
// no deletion: add the sample
sampled_signal.push_back(signal[i]);
}
if(do_repeat) {
// do repetition: redo the same iteration over the same sample
i--;
}
}
return sampled_signal;
}