HDSC Seminar, Spring 2024

Organizer: Michael Lindsey
Meeting details: Thursday 1-2, Evans 1015

Description

Welcome to an informal seminar on high-dimensional scientific computing (HDSC). We will investigate paradigms for HDSC including tensor networks, Monte Carlo methods, semidefinite programming relaxations, graphical models, neural networks, and more, as well as tools from numerical linear algebra and optimization.

Past semesters: [Fall 2023]

Schedule

Click for abstracts.

February 1
Speaker: Yuhang Cai [home page]
Topic: Clustering in self-attention dynamics

Abstract. Viewing transformers with fixed weights as interacting particle systems, the particles, representing tokens, tend to cluster toward particular limiting objects as time tends to infinity. With techniques from dynamical systems and PDEs, it can be shown that the type of limiting object depends on the spectrum of the value matrix.


February 8
Speaker: Michael Lindsey [home page]
Topic: What I learned about QTTs last semester

Abstract. Last semester there was a fair amount of coverage of quantized tensor trains (QTTs) both in this seminar and the Applied Math Seminar. I will tell you what I've learned meanwhile about QTTs and share some thoughts and questions regarding their future in numerical analysis.


February 15
Speaker: Michael Kielstra [home page]
Topic: Approximation by exponential sums

Abstract. Review of this paper and this follow-up.


February 22
Speaker: Kevin Stubbs [home page]
Topic: Gauging tensor networks with belief propagation

Abstract. Review of this paper.


February 29
Speaker: Mark Fornace [home page]
Topic: Determinantal point processes in low-rank approximation

Abstract. Low-rank approximation of symmetric positive semidefinite matrices based on column subset selection can enable efficient algorithms for matrix sketching, experimental design, and reduced-order modeling. Determinantal point processes (DPPs) yield theoretically rigorous bounds for the worst-case optimal performance of such approximations. Proofs of relevant bounds have a rich (and long) history, relating to such topics as elementary symmetric polynomials, real algebraic geometry, Schur convexity, and random matrix theory. Besides the theory of its worst-case performance, DPP sampling has also been the subject of numerous algorithmic implementations in recent years. In this seminar, I will give a brief introduction to some applications, underlying theory, and algorithms related to DPPs in low-rank approximation.

Selected references:

Derezinski, Michał, and Michael W. Mahoney. "Determinantal point processes in randomized numerical linear algebra." Notices of the American Mathematical Society 68.1 (2021): 34-45. [link]

Guruswami, Venkatesan, and Ali Kemal Sinop. "Optimal column-based low-rank matrix reconstruction." Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2012. [link]


March 7
Speaker: Mark Fornace [home page]
Topic: Determinantal point processes in low-rank approximation (continued)

Abstract. Low-rank approximation of symmetric positive semidefinite matrices based on column subset selection can enable efficient algorithms for matrix sketching, experimental design, and reduced-order modeling. Determinantal point processes (DPPs) yield theoretically rigorous bounds for the worst-case optimal performance of such approximations. Proofs of relevant bounds have a rich (and long) history, relating to such topics as elementary symmetric polynomials, real algebraic geometry, Schur convexity, and random matrix theory. Besides the theory of its worst-case performance, DPP sampling has also been the subject of numerous algorithmic implementations in recent years. In this seminar, I will give a brief introduction to some applications, underlying theory, and algorithms related to DPPs in low-rank approximation.

Selected references:

Derezinski, Michał, and Michael W. Mahoney. "Determinantal point processes in randomized numerical linear algebra." Notices of the American Mathematical Society 68.1 (2021): 34-45. [link]

Guruswami, Venkatesan, and Ali Kemal Sinop. "Optimal column-based low-rank matrix reconstruction." Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2012. [link]


March 14
Speaker: Yuhang Cai [home page]
Topic: Review of mirror descent

Abstract. Review of the theory of mirror descent following Yuhang's notes.


April 4
Speaker: Yizhi Shen
Topic: Classical shadow for predicting quantum state properties

Abstract. Classical shadow is an efficient method for constructing an approximate classical description of an unknown quantum state using very few measurements. It learns a minimal classical sketch, the classical shadow, of the state that can be used to predict arbitrary state properties using a simple median-of-means protocol. Importantly, it achieves optimal sample complexity, ensuring accurate predictions of various functions of a quantum state with high success probability.


April 11
Speaker: Yizhi Shen
Topic: Classical shadow for predicting quantum state properties (continued)

Abstract. Classical shadow is an efficient method for constructing an approximate classical description of an unknown quantum state using very few measurements. It learns a minimal classical sketch, the classical shadow, of the state that can be used to predict arbitrary state properties using a simple median-of-means protocol. Importantly, it achieves optimal sample complexity, ensuring accurate predictions of various functions of a quantum state with high success probability.


April 25
Speaker: Max Zubkov [home page]
Topic: Geometry of Polynomial Neural Networks, Tensors, and Beyond

Abstract. Neural Networks (NNs) have been successfully used in various applications despite a complete understanding of the learning process. Polynomial Neural Networks (PNNs) is a minimal model architecture that allows, with the help of Algebraic Geometry, to shine some light into this black box. In this talk, we will look at the mathematical perspective of NNs and how PNNs are connected to different symmetric tensor decompositions. We will show how the inherent structure and weights of PNNs remember the geometry and symmetries of learned models. In the end, we would touch on possible applications to other NNs architectures (CNNs, PINNs, Binary NNs, LLMs, etc).


May 2
Speaker: Yuhang Cai [home page]
Topic: Optimal batchsize and stepsize of SGD in the interpolation regime

Abstract. For least-squares regression in the interpolation regime, the loss has an exponential convergence, and there exists n such that SGD iteration with mini-batch size m < n is nearly equivalent to m iterations of mini-batch size 1. When m > n, SGD iteration is nearly equivalent to a full gradient descent iteration.


Sample topics to present

In no particular order.

Machine learning

  1. Theory of benign overfitting

  2. Theory of SGD training in the interpolation regime

  3. Does this paper explain feature learning?

  4. Theory of convex neural networks

  5. What's new in the mathematical study of transformers: [link 1] [link 2]
    (See this crash course for background.)

  6. What norm do neural network parameters induce in function space?
    [1 dimension] [general case]

  7. Neural networks and Gaussian processes
    1. Deep Neural Networks as Gaussian Processes
    2. Neural tangent kernel
    3. Wide Bayesian neural networks have a simple weight posterior

Matrix sketching

  1. Review of matrix sketching (leverage and DPP), and see this review of DPP

  2. Ridge leverage scores for matrix sketching: paper, slides, and extended slides

  3. Kernel low-rank approximation in input-sparsity time?

  4. Streaming matrix sketching

  5. Classic: Fast Johnson-Lindenstrauss transform

  6. A DEIM Induced CUR Factorization, see also the references and discussion in these slides

Group synchronization

  1. Robust synchronization
    1. Cycle-edge message passing
    2. Message passing least squares
    3. Quadratic programming
    4. Representation theory

Tensor networks

  1. Belief propagation for tensor networks
    1. Background materials: [Wainwright and Jordan] and [Mézard and Montanari]
    2. Duality of Graphical Models and Tensor Networks
    3. Block Belief Propagation Algorithm for 2D Tensor Networks
    4. Gauging tensor networks with belief propagation

  2. General tensor network contraction
    1. Hyper-optimized compressed contraction of tensor networks with arbitrary geometry
    2. Contracting Arbitrary Tensor Networks

Sampling with sign problems

  1. Sampling for lattice gauge theories, cf. Gattringer and Lang and talk to me

  2. Complex Langevin
    1. Math references here and here
    2. See also this review and Sec. 8 of this review

Potpourri

  1. What's new in Vlasov

  2. What's new in sequence modeling

  3. Stochastic Optimal Control for CV Free Sampling of Molecular Transition Paths