About me

I'm Chris, currently pursuing my PhD at the Freie Universität Berlin and work at the Max-Planck-Institute of molecular genetics in Berlin. I'm a computer scientist by training who is now working in the fields of bioinformatics. My research is focused on enumerating and finding lncRNA (long non-coding RNA) in genomes. Therefore I currently work on string indices, especially bidirectional indices such as the 2FM index. I'm working on SeqAn, an open source C++ library for sequence analysis. You can also find there my latest work, a constant-time bidirectional FM index that outperforms other open source implementations of bidirectional indices in speed. For my professional website, visit www.christopher-pockrandt.de.

Publications

VARSCOT: Variant-aware detection and site-scoring enables sensitive and personalized off-target detection

(under review)

Optimum Search Schemes for Approximate String Matching Using Bidirectional FM-Index

Kiavash Kianfar, Christopher Pockrandt, Bahman Torkamandi, Haochen Luo, Knut Reinert, 2017

Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. Bidirectional indices have opened new possibilities in this regard allowing the search to start from anywhere within the pattern and extend in both directions. In particular, use of search schemes (partitioning the pattern and searching the pieces in certain orders with given bounds on errors) can yield significant speed-ups. However, finding optimal search schemes is a difficult combinatorial optimization problem.
Here for the first time, we propose a mixed integer program (MIP) capable to solve this optimization problem for Hamming distance with given number of pieces. Our experiments show that the optimal search schemes found by our MIP significantly improve the performance of search in bidirectional FM-index upon previous ad-hoc solutions. For example, approximate matching of 101-bp Illumina reads (with two errors) becomes 35 times faster than standard backtracking. Moreover, despite being performed purely in the index, the running time of search using our optimal schemes (for up to two errors) is comparable to the best state-of-the-art aligners, which benefit from combining search in index with in-text verification using dynamic programming. As a result, we anticipate a full-fledged aligner that employs an intelligent combination of search in the bidirectional FM-index using our optimal search schemes and in-text verification using dynamic programming outperforms today's best aligners. The development of such an aligner, called FAMOUS (Fast Approximate string Matching using OptimUm search Schemes), is ongoing as our future work.

The SeqAn C++ template library for efficient sequence analysis: A resource for programmers

Knut Reinert et al., 2017

Background:
The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome (Venter et al., 2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 (Döring et al., 2008).
Results:
The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools.
Conclusions:
We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.

EPR-dictionaries: A practical and fast data structure for constant time searches in unidirectional and bidirectional FM-indices

Christopher Pockrandt, Marcel Ehrhardt, Knut Reinert, 2016

We introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in \(\mathcal{O}(1)\) time per step while using \(\mathcal{O}(\log \sigma \cdot n) + o(\log \sigma \cdot \sigma \cdot n)\) bits of space. This is done by replacing the wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between \(\approx 2.7-4.6\) times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.

Master Thesis: Generic Implementation of a Bidirectional FM-Index in SeqAn and Applications

Christopher Pockrandt, 2015

Many alignment-tools are using unidirectional string indices, that only allow searching sequences into one direction. In this master thesis we are implementing a bidirectional FM index in SeqAn, a C++ library providing efficient algorithms and data structures for sequence analysis. This index can search a pattern in both directions, i.e. extend it character by character to the left as well as to the right in an arbitrary order. It allows to design more efficient search strategies for exact and approximate string matching, that we will implement and analyze by comparing it to common search strategies used by Lambda, a local alignment tool, optimized for protein alphabets.

Bachelor Thesis: Planarity Testing via PQ-Trees: Then and Now

Christopher Pockrandt, 2013

Graphs that can be drawn on the plane without its edges intersecting, are called planar graphs. They play an important part in computer science: several algorithmic problems can be solved more efficiently on planar graphs than on arbitrary ones, such as finding shortest paths or calculating maximum flows. There are numerous algorithms that decide in linear time, whether a graph is planar or not and if so, return an embedding in the plane. A fundamental milestone was the algorithm by Booth and Lueker based on a new data structure called PQ-trees. Many simpler algorithms were presented, which had different approaches or used a modification of the data structure. This thesis presents PQ-trees and outlines two algorithms: the original algorithm by Booth and Lueker (1976) and a recently developed, simplified one by Haeupler and Tarjan (2008) based on a reinterpretation of PQ-trees.

Extended abstracts / Oral presentations

Computing Genome Mappability Scores for Faster Read Mapping using Optimum Search Schemes

Kiavash Kianfar, Christopher Pockrandt, Bahman Torkamandi, Haochen Luo, Knut Reinert, 2018

Speeding Up Approximate String Searching in Indices: EPR-Dictionaries and Optimal Search Schemes

Christopher Pockrandt, Marcel Ehrhardt, Knut Reinert, 2017

Teaching

Instructor/lecturer

  • summer 17 - ProInformatik III "Object Oriented Programming"
  • summer 17 - Software project "Usable Security for Smartphones"

Teaching assistant

  • winter 15/16 - Algorithms
  • summer 15 - ProInf: Functional Programming (with lectureship)
  • summer 14 - Software Engineering
  • winter 13/14 - Algorithms and Data Structures
  • summer 13 - Software Engineering
  • summer 12 - Software Engineering
  • winter 11/12 - Functional Programming

Honors

  • 2011 - 2015 Deutschlandstipendium (National Scholarship Program)
  • 2014 Grant for International Students from the National Taiwan University and the Ministry of Education of the Republic of China
  • 2014 Promos Study Abroad Scholarship at National Taiwan University
  • 2012 Promos Study Abroad Scholarship at National University of Singapore