Research

The focus of my research is to develop new tools for analyzing data, to understand their underlying mathematical theory, and to use these tools in applications. This work lies in the field of applied topology, and more specifically, topological data analysis.

Why is topological data analysis of interest to data scientists?

Topological data analysis (TDA) approaches data science with a toolbox based on ideas from algebraic topology which are very different from the ideas underlying standard approaches to data analysis. It is particularly useful in situations where the data has a complex structure that is crucial to understanding the data but is sufficiently complicated that we are unable to construct a satisfactory model using standard methods.

Why is topological data analysis of interest to mathematicians?

TDA has important and beautiful connections with a wide variety of mathematics. My current research uses results from probability, combinatorics, functional analysis, representation theory, commutative algebra, geometric topology, general topology and category theory. It is exciting to explore new connections between pure mathematics and applications.

What is applied topology and topological data analysis?

Algebraic topology is particularly useful for connecting local and global properties of mathematical objects. Often the local description is easy to understand, but one is particularly interested in global qualities that depend on how the local information fits together. Many powerful tools in algebraic topology have been developed for this purpose. Applied topology adapts these tools so that they may be used for the same purpose in an applied setting.

In topological data analysis, data is encoded in an increasing sequence of complexes. This choice of encoding requires a good understanding of both the data and the topological machinery. Persistent homology provides an efficient algorithm for calculating and describing how the topology of this complex changes as one moves along this sequence. One then uses this summary make inferences on the data.

Statistical topological data analysis

An impediment to the broader use of applied topology has been its incompatibility with statistical methods. For example, one would like to be able to calculate the averages of the standard topological summaries (barcodes and persistence diagrams) and understand their variances. Furthermore, one would like to incorporate them into machine learning algorithms.

I have developed a topological summary, the persistence landscape [7], which is more amenable to statistical analysis and to machine learning. The persistence landscape may be viewed as a mapping of persistence diagrams to either a function space or a finite dimensional Euclidean space. In either case, we have a dot product (actually, a Hilbert space structure) which allows us to apply most of the standard tools of statistics and machine learning.

With Pawel Dlotko, I have constructed and implemented efficient algorithms for calculating persistence landscapes [11]. These tools are publicly available as the Persistence Landscapes Toolbox.

With Peter Kim, Zhi-Ming Luo and Gunnar Carlsson, I have previously combined topology and statistics in parametric situations [13], for (nonparametric) functions on Riemannian manifolds [8].

Foundations of applied topology

Algebraic topology has frequently undergone a process in which previous results were redeveloped from a more abstract point of view. This has both clarified key ideas and proofs and allowed previous results to be vastly generalized and applied in new settings. I have worked to apply this process to some parts of applied topology.

With Jonathan A. Scott, I redeveloped persistent homology from a categorical point of view [14]. The main objects of study are diagrams indexed by the poset of real numbers. Such diagrams have an interleaving distance, which we’ve shown generalizes the previously studied bottleneck distance. As a consequence we are able to greatly generalize previous stability results for persistence, extended persistence, and kernel, image and cokernel persistence. We describe a category of interleavings of such diagrams, and show that if the target category is abelian, so is this category.

Together with Vin de Silva, we have two further paper [10, 9] in which we consider more general indexing categories. Our general theorems specialize to useful results in previously studied settings such as multidimensional persistence and dynamical systems, and also suggest the right constructions in new settings, such as recent work on categorical Reeb graphs. Abstractly, we can apply persistence whenever we have a monoidal adjunction that allows us to measure homotopies, or when the morphisms in our indexing categories have weights. We also hope that these results will enable advances in applied topology to find applications in pure mathematics.

With Vin de Silva and Vidit Nanda, I have written a paper [18] on persistent homology and Lipschitz extensions, that combines ideas from applied topology, metric geometry and category theory.

Topology and biology

With Giseon Heo, Nikoleta Kovacev-Nikolic and Dragan Nikolic, I used the persistence landscape to analyze crystallographic protein structures [17]. We were able to distinguish between open and closed confirmations of the maltose binding protein, an important protein in E. coli that transports sugar molecules across the cell membrane.

With Moo Chung and Peter Kim, I applied topological data analysis to distinguish autistic and control subjects based using their brain images [16].

Topology and physics

Hard spheres are among the most well studied models of matter. A number of papers in statistical mechanics have explored the hypothesis that phase transitions are due to changes in the topology of the underlying configuration space. Surprisingly, the answers to a number of basic topological questions on the configuration space of hard spheres are unknown.

In work with Yuliy Baryshnikov and Matthew Kahle [1], we developed a Morse theory for studying the homotopy theory of this configuration space. The critical points and critical submanifolds in this theory correspond to mechanically balanced configurations of spheres. As an application, we find the precise threshold radius for such a configuration space to be homotopy equivalent to the configuration space of points. We are working on a followup paper analyzing the non-degeneracy and index of critical points and investigating the asymptotic properties of this configuration space.

Topology and computer science

After many decades of exponential growth in the speed of single-thread execution, current advances in computation are more driven by increases in parallelism. However, the analysis of truly concurrent programs is very difficult. In a concurrent environment, the processes can access shared resources in various orders, which can result in very different executions.

Mathematically, concurrent programs can be described by a state space together with non-reversible execution paths. One approach to concurrency is a directed version of homotopy theory. In [5], I generalized the van Kampen theorem to these directed spaces. In [4], I studied pushouts in an undercategory of partially ordered spaces. More recently [6], I used work of Jacob Lurie on higher category theory to give a model for concurrency whose state space and execution spaces are simplicial sets.

This topic is also of interest in pure mathematics. For example, my work with Kris Worytkiewicz on a model category for local partial-ordered spaces [15] has been used in studying paths in stratified spaces.

Algebra and algebraic topology

In an algebra paper with Leah Gold [12], given a finite simple vertex-weighted graph, we construct a graded associative (non-commutative) algebra, whose generators correspond to vertices and whose ideal of relations has generators that are graded commutators corresponding to edges. We show that the Hilbert series of this algebra is the inverse of the clique polynomial of the graph. Using this result it easy to recognize if the ideal is inert, from which strong results on the algebra follow. Noncommutative Grobner bases play an important role in our proof. Our result has an interesting application to toric topology. Indeed, this algebra arises naturally from a partial product of spheres, which is a special case of a generalized moment angle complex, and we apply our result to the loop space homology of this space.

In earlier work [2, 3] I give new results on the cell attachment problem, which was perhaps first studied by J.H.C. Whitehead around 1940: If one attaches one or more cells to a topological space, what is the effect on the homology of the loop space, and on the homotopy-type? Much of this work involves a related question for differential graded Lie algebras.

References

  1. Yuliy Baryshnikov, Peter Bubenik, and Matthew Kahle. Min-type Morse theory for configuration spaces of hard spheres. International Mathematical Research Notices, 2014 (2014), no. 9, 2577–2592.
  2. Peter Bubenik. Free and semi-inert cell attachments. Trans. Amer. Math. Soc., 357(11):4533– 4553, 2005.
  3. Peter Bubenik. Separated Lie models and the homotopy Lie algebra. J. Pure Appl. Algebra, 212(2):401–410, 2008.
  4. Peter Bubenik. Context for models of concurrency. Electron. Notes Theor. Comput. Sci., 230:3– 21, 2009.
  5. Peter Bubenik. Models and van Kampen theorems for directed homotopy theory. Homology, Homotopy Appl., 11(1):185–202, 2009.
  6. Peter Bubenik. Simplicial models for concurrency. Electronic Notes in Theoretical Computer Science, 283(0):3 – 12, 2012. Proceedings of the workshop on Geometric and Topological Methods in Computer Science (GETCO).
  7. Peter Bubenik. Statistical topological data analysis using persistence landscapes. Journal of Machine Learning Research, 16 (2015), 77–102.
  8. Peter Bubenik, Gunnar Carlsson, Peter T. Kim, and Zhi-Ming Luo. Statistical topology via Morse theory persistence and nonparametric estimation. In Algebraic methods in statistics and probability II, volume 516 of Contemp. Math., pages 75–92. Amer. Math. Soc., Providence, RI, 2010.
  9. Peter Bubenik, Vin de Silva, and Jonathan A. Scott. Categorification of Gromov-Hausdorff distance and interleaving of functors,  arXiv:1707.06288 [math.CT],
  10. Peter Bubenik, Vin de Silva, and Jonathan A. Scott. Metrics for generalized persistence modules. Foundations of Computational Mathematics, 15 (2015), no. 6, 1501–1531.
  11. Peter Bubenik and Pawel Dlotko. A persistence landscapes toolbox for topological statistics. Journal of Symbolic Computation, 78: 91-114, 2017.
  12. Peter Bubenik and Leah H. Gold. Graph products of spheres, associative graded algebras and Hilbert series. Math. Z., 268(3-4):821–836, 2011.
  13. Peter Bubenik and Peter T. Kim. A statistical approach to persistent homology. Homology, Homotopy Appl., 9(2):337–362, 2007.
  14. Peter Bubenik and Jonathan A. Scott. Categorification of persistent homology. Discrete & Computational Geometry, pages 1–28, 2014.
  15. Peter Bubenik and Krzysztof Worytkiewicz. A model category for local po-spaces. Homology, Homotopy Appl., 8(1):263–292, 2006.
  16. Moo K. Chung, Peter Bubenik, and Peter T. Kim. Persistence diagrams or cortical surface data. In Information Processing in Medical Imaging 2009, number 5636 in Lecture Notes in Computer Science, pages 386–397, 2009.
  17. Violeta Kovacev-Nikolic, Peter Bubenik, Dragan Nikolic, and Giseon Heo. Using persistent homology and dynamical distances to analyze protein binding. Statistical Applications in Genetics and Molecular Biology, 15(1):19-38, 2016.
  18. Peter Bubenik, Vin de Silva, and Vidit Nanda. Higher interpolation and extension for persistence modules. SIAM J. Appl. Algebra Geometry, 1: 272-284, 2017.