CMU logo
Expand Menu
Close Menu

SCS Special Seminar

Speaker
WILLIAM YU
Assistant Professor
Department of Computer and Mathematical Sciences
University of Toronto

When
-

Where
Virtual Presentation - ET

Description

The selection of subsampled features from sets is one of the primitive tasks enabling efficient biomedical algorithms. One of the classical approaches is to apply some hash function to the set and keep only the minimum hashed values; with slight variations in context, this gives rise to both MinHash, a probabilistic sketch for computing Jaccard index between sets, and minimizers, a k-mer selection scheme for finding sparse anchors along genomic sequences. More recently, open sync-mers were introduced in the literature as an alternative to minimizers, and they turn out to have some nice theoretical properties.

In this talk, we cover several related topics. First, we show that lossily compressing MinHash buckets using a floating-point encoding reduces space-complexity from O(log n) to O(log log n). Second, we carefully analyze open sync-mers and prove an optimal choice of parameters for open sync-mers under a point mutation k-mer conservation model, and show that these choices can improve read mapping chaining scores. Finally, if time permits, we will discuss some additional theoretical connections between minimum-hashing based methods and convolutional filters with max-pooling.

Joint work with Jim Shaw and Griffin Weber. —

William Yu is an assistant professor of mathematics at the University of Toronto. He trained under Bonnie Berger at MIT for his PhD, and was a postdoc at Harvard Medical School with Griffin Weber.

Zoom Participation. See announcement.