HDSC Seminar, Spring 2024

Past semesters: [Fall 2023]

** Abstract.** Viewing transformers with fixed weights as interacting particle systems, the particles, representing tokens,
tend to cluster toward particular limiting objects as time tends to infinity. With techniques from dynamical systems and PDEs,
it can be shown that the type of limiting object depends on the spectrum of the value matrix.

** Abstract.** Last semester there was a fair amount of coverage of quantized tensor trains (QTTs) both in this
seminar and the Applied Math Seminar. I will tell you what I've
learned meanwhile about QTTs and share some thoughts and questions regarding their future in numerical analysis..

** Abstract.** Review of this paper and this follow-up.

** Abstract.** Review of this paper.

** Abstract.** Low-rank approximation of symmetric positive semidefinite matrices based on column subset
selection can enable efficient algorithms for matrix sketching, experimental design, and reduced-order modeling.
Determinantal point processes (DPPs) yield theoretically rigorous bounds for the worst-case optimal performance of
such approximations. Proofs of relevant bounds have a rich (and long) history, relating to such topics as elementary
symmetric polynomials, real algebraic geometry, Schur convexity, and random matrix theory. Besides the theory of
its worst-case performance, DPP sampling has also been the subject of numerous algorithmic implementations in recent
years. In this seminar, I will give a brief introduction to some applications, underlying theory, and algorithms
related to DPPs in low-rank approximation.

Derezinski, MichaĆ, and Michael W. Mahoney. "Determinantal point processes in randomized numerical linear algebra."

Guruswami, Venkatesan, and Ali Kemal Sinop. "Optimal column-based low-rank matrix reconstruction."

- Theory of benign overfitting
- Theory of SGD training in the interpolation regime
- Does this paper explain feature learning?
- Theory of convex neural networks
- What's new in the mathematical study of transformers: [link 1] [link 2]

(See this crash course for background.) - What norm do neural network parameters induce in function space?

[1 dimension] [general case] - Neural networks and Gaussian processes
- Deep Neural Networks as Gaussian Processes
- Neural tangent kernel
- Wide Bayesian neural networks have a simple weight posterior

- Review of matrix sketching (leverage and DPP), and see this review of DPP
- Ridge leverage scores for matrix sketching: paper, slides, and extended slides
- Kernel low-rank approximation in input-sparsity time?
- Streaming matrix sketching
- Classic: Fast Johnson-Lindenstrauss transform
- A DEIM Induced CUR Factorization, see also the references and discussion in these slides

- Robust synchronization
- Cycle-edge message passing
- Message passing least squares
- Quadratic programming
- Representation theory

- Belief propagation for tensor networks
- Background materials: [Wainwright and Jordan] and [Mézard and Montanari]
- Duality of Graphical Models and Tensor Networks
- Block Belief Propagation Algorithm for 2D Tensor Networks
- Gauging tensor networks with belief propagation
- General tensor network contraction
- Hyper-optimized compressed contraction of tensor networks with arbitrary geometry
- Contracting Arbitrary Tensor Networks

- Sampling for lattice gauge theories, cf. Gattringer and Lang and talk to me
- Complex Langevin
- Math references here and here
- See also this review and Sec. 8 of this review

- What's new in Vlasov
- What's new in sequence modeling
- Stochastic Optimal Control for CV Free Sampling of Molecular Transition Paths