CS 598: Expansion, Codes, and Optimization

Main Topics

Expansion: Expander graphs combine two opposing properties of being well-connected yet sparse. This powerful combination leads to a variety of applications in CS (and mathematics) such as error correction, hardness of approximation, fast algorithms, sampling, etc. Recently, several notions of high-dimensional expansion appeared, leading to exciting discoveries.

Coding theory: Codes are "robust" collections of strings that not only have implications for protection against errors in communication and storage, but they also have connections to diverse fields such as complexity theory, expansion, etc. Recently, many breakthrough constructions of codes were discovered, such as explicit binary codes close to the GV bound, good LTCs, and good qLDPC.

Optimization: Convex programming, in particular, linear programming (LP) and semi-definite programming (SDP), are at the heart of many efficient algorithms. The use of powerful SDP hierarchies such as the Sum-of-Squares hierarchy has recently found many algorithmic applications.

Course Phases

Phase 1

In this initial phase, we will cover some foundational topics of these fields which may include some of the following:

Expanders: Notions of expansion, Random Walks, Cheeger's inequality, Cayley graphs of Abelian groups, Zig-zag construction, Ramanujan Graphs
Coding Theory: Reed Solomon, Reed Muller, Hadamard, LDPC, Tanner, and Quantum CSS Codes, Operations: code concatenation, AEL, tensor, Some code bounds: GV bound, MRRW, Singleton, etc, List Decoding
Fourier Analysis: Basics (Fourier decomposition, Parseval, etc), BLR, biased distributions and codes, Hypercontractivity
Optimization: Linear programming (Delsarte's LP combining codes, Fourier analysis and LPs), Semi-definite programming, and Sum-of-Squares (CSPs on Expanders and a local-to-global phenomenon)
Spectral graph theory: Sensitivity Conjecture (signing, interlacing, etc), Johnson scheme (a complete analog of simplicial HDXs)

Phase 2

In this second phase, we will cover some recent topics at the research frontiers which may include some of the following:

Dinur's PCP by gap amplification
Explicit balanced codes close to the GV bound by Ta-Shma
Locally Testable Codes, Left-Right Cayley Complexes, c^3-LTC of Dinur et al
Good quantum LDPC Codes, Quantum Tanner Codes
Locally Decodable Codes, Matching Vector Codes
Bipartite Ramanujan graphs (2-lifts MSS'14)
High-dimensional expanders: Random walks (Up and Down walks), Trickle-down, Expansion of bases exchange of Matroids

Phase 3

In this last phase, we will have project presentations. Topics for projects will be selected as we transition from phase 1 to phase 2, with several suggestions available and students encouraged to consult with the instructor.

Topics for Projects

Possible topics include, but are not limited to:

Classical: Agreement Expansion, HDX constructions, Derandomizing Space Bounded Computation, Tree codes, Global Hypercontractivity, Lower bounds on LCCs, Strong bounds for 3-APs, Insertions and Deletions, Reed Muller against BEC and BSC.
Quantum: Quantum PCP, Locally Testable Quantum Codes, SYK and Sum-of-Squares, Pseudorandom Unitaries, Quantum Expanders.

Lectures

here

Lecture 1: Overview of the course, some notions of expansion (e.g., edge and vertex expansion), recalling the spectral theorem from linear algebra, and pseudorandom properties of expanders (e.g., expander mixing lemma).

Lecture 2: We recalled the previous lecture, introduced the random walk operator, proved mixing bounds of random walks, introduced Cheeger's inequality, and very briefly mentioned the Laplacian matrix.

Lecture 3: Quadratic form and its connection to the spectral decomposition. Optimization/variational characterization of eigenvalues, including the Courant-Firscher-Weyl min-max theorem. Relating the spectrum of principal submatrices (in particular, induced subgraphs) via Cauchy interlacing theorem. Recursive definition of the adjacency matrix of the Boolean hypercube. Introduced Boolean functions and two complexity measures: sensitivity and block sensitivity. Stated the (now resolved) sensitivity conjecture.

Lecture 4: Introduced Fourier analysis of Boolean functions with characters, their orthogonality, and Fourier decomposition. Introduced the complexity measures of degree and approximate degree. Discussed the classical and quantum query model with the decision trees and quantum oracle model. Introduced associated complexity measures D(f) and Q(f), and their connection to degree and approximate degree. Polynomial relation among complexity measures. Implications to Grover's algorithm and quantum speed-up for total functions. Introduced edge signing of graphs. Presented Huang's proof of the sensitivity conjecture using the spectral graph theory toolkit developed so far.

Lecture 5: Various properties of the Fourier transform (Parseval, variance, etc). Convolution and its Fourier transform. Characters as homomorphisms. Characters as the eigenfunctions of the Boolean hypercube. Property testing with the case of linearity testing. The BLR linearity test, and its Fourier analytic proof.

Lecture 6: A case of study of pseudorandomness through the fundamental notion of epsilon-biased distributions and their power in fooling some classes of functions. Cayley graphs, an important class of graphs defined from a group and a set of generators. A reminder of characters as eigenfunctions of Cayley graphs with the case of Z_2^n. The equivalence of an epsilon-biased distribution and a two-sided Cayley expander over Z_2^n. Started introducing coding theory with the notions of code, Hamming distance, and rate.

Lecture 7: More on the basics of coding theory. Linear codes as an important class of codes, and some of their properties. The Hadamard code with its basic properties and its connection to characters and the Fourier transform. More on the local testability of the Hadamard code.

Lecture 8: Generator matrix of linear codes. Extending the equivalence of epsilon-biased distributions and expanders to also include epsilon-balanced codes. Code as an independent set problem. Gilbert-Varshamov (GV) bound for general codes via graph theory. GV bound for linear codes and the existence of small support epsilon-biased distributions. Entropy and its relation to the volume of Hamming balls. Rate upper bounds via the Hamming bound.

Lecture 9: Proved another impossibility result known as the Singleton bound. Discussed the mantra that low-degree polynomials have few roots (a simple and yet central theme in coding theory and TCS). Introduced the Reed-Solomon (RS) codes and showed that they achieve the Singleton bound (an example of MDS codes). Discussed the generating matrix of RS codes. Introduced the notion of the dual of a linear code and its connection to the parity check matrix of a code. Introduced the notion of LDPC codes and started discussing how one might try to construct good families of LDPC codes based on expanders.

Lecture 10: Flipped classroom: you constructed a family of good LDPC Tanner codes based on bipartite spectral expander graphs. You leveraged a local constant-sized code C_0 into a family of good codes of growing blocklength (this is an example of the local-to-global phenomenon of expanders). You proved its rate and established its minimum distance (try to iron the details out now).

Lecture 11: We recalled the Tanner code construction from last class. Discussed tensor product codes and their parameters, and how the expander based Tanner codes can be seen as "derandomization" of tensor codes. Next, we moved to explicit constructions of bounded degree families of expander graphs based on the Zig-Zag graph product. Introduced the replacement product of graphs, the permutation matrix associated with the edges of the outer graphs, and the edges of the Zig-Zag (short-long-short). Finally, we gave a matrix description of the (normalized) adjacency matrix of the Zig-Zag product.

Lecture 12: Flipped classroom: after we recalled the Zig-Zag product, we proceeded to the analysis of its expansion. You computed a bound on its two-sided spectral expansion by splitting the analysis into a few cases (I guided you from a distance towards the original analysis of the Zig-Zag which has beautiful interpretations though it is not the shortest). Then, we discussed how to use this product recursively to build an entire explicit family of bounded degree expanders. We also briefly mentioned that by tensoring the graphs, one can also get strongly explicit constructions.

Lecture 13: We recalled some of the various explicit pseudorandom constructions from class so far. Then, we proceeded to find an explicit construction of epsilon-balanced codes starting from our explicit construction toolkit. We discussed how XOR can be used to reduce bias, which, in particular, amplifies the distance. The first naive dense construction led to a vanishing rate. We briefly discussed that a randomized construction leads to (near) optimal rate, but it is non-explicit. Then, we discussed how to use expander walk-based XOR amplification as an attempt to derandomize the random construction.

Lecture 14: Flipped classroom: We recalled the expander walk-based XOR amplification from the last class. You first mapped the bias reduction computation to a linear algebraic problem involving the normalized adjacency matrix of the expander. Then, you showed an exponential bias reduction in the walk length. Finally, we concluded by showing that, with a (near) Ramanujan expander, this explicit construction yields explicit epsilon-balanced codes of rate Omega(epsilon^{4+o(1)}).

Lecture 15: We discussed the problem of s-t connectivity on undirected graphs and its connection to space-bounded computation. We discussed how this problem can be solved using O(log(n)) space on bounded degree expanders. We discussed how our Zig-Zag toolkit (derandomization of degree) together with graph powering naturally leads to a strategy to solve this problem in O(log(n)) (try to implement this strategy now). Then we went back to coding theory to discuss the important concept of list decoding as a way of going beyond the unique decoding radius. We stated the list decoding capacity theorem, which establishes the best list decoding radius vs rate trade-offs. Flipped classroom: you correctly guessed the trade-offs and then you proved the theorem using the probabilistic method.

Lecture 16: We recalled some of the various notions of expansion for graphs we have seen so far in class. Then we initiated a discussion on how to extend our expansion zoo. We started with generalizations for hypergraphs. We introduced the notion of simplicial complexes and links. We introduced the complete complex of "dimension" 'd' as the hypergraph analog of complete graphs and showed how this complex leads to many well-known graphs on the Johnson scheme (Johnson, Kneser graphs, etc). We presented a high-dimensional expansion notion of link spectral expanders (one-sided and two-sided). Mentioned how it is intended to capture derandomizations of the (ideal) complete complex. Next, we introduced the notion of quantum expanders and how they can be seen as natural quantum generalizations of expander graphs.

Lecture 17: We recalled the definitions of high-dimensional expanders and quantum expanders from the last class. Next, we switched gears to convex optimization. First, we gave a linear programming (LP) relaxation for the independence number of a graph. Then, we discussed how allowing simple-looking quadratic constraints leads to 0/1 optimization and the ability to decide NP-complete problems. Next, we introduced semi-definite programming (SDP) and how it can emulate some relaxed forms of quadratic constraints by optimizing over positive semi-definite (PSD) matrices. We discussed how SDPs are closely connected to quantum since the very description of a quantum state is a PSD matrix of trace 1. We designed a simple SDP to approximate the acceptance probability of a QMA verifier (quantum version of NP), from which we deduced that QMA is contained in EXP (note that better upper bounds are known). Then, we returned to graphs to design an SDP relaxation for the independence number. In doing so, we deduced (a formulation of) the Lovász theta function. Finally, we started to set the stage for more powerful forms of SDPs in the form of the Sum-of-Squares SDP hierarchy. Initially, we conducted a thought experiment assuming that we were able to evaluate the expectation of any given function on the uniform distribution over optimum solutions (in this case a distribution over maximum independent sets of a graph). We observed axioms that the expectation operator (seen as a map from functions to the reals) satisfies. We started to discuss natural ways to relax the expectation operator, which led to the pseudo-expectation operator view of the Sum-of-Squares hierarchy.

Lecture 18: We recalled concepts from last class: the LP relaxation for the independence number, the Lovász theta function, and the initial discussion on the pseudo-expectation view of the Sum-of-Squares (SoS) SDP hierarchy. We proceeded to fully develop the pseudo-expectation view of SoS hierarchy for the independence number of a graph. Two key shortcomings of pseudo-expectation operator (at level D of the hierarchy) are: (1) it is only defined for polynomials of degree at most D (larger degree polynomials cannot be evaluated), and (2) it is only guaranteed to be non-negative on polynomials that can be written as sum-of-squares (of polynomials of degree at most D/2) rather than being non-negative for any non-negative function. We divided the properties of the pseudo-expectation operator into problem-independent and problem-specific. Next, we move on to explain how the pseudo-expectation operator view is equivalent to a "big" SDP. For this, we introduced the notion of the (pseudo-)moment matrix (whose entries are (pseudo)-expectations of monomials of degree at most D). Next, we observed that for "true" distributions, the moment matrix is PSD, so we can require the pseudo-moment matrix to also be PSD. We showed how several properties of the pseudo-expectation operator can be expressed as constraints on the (pseudo-)moment matrix.

Lecture 19: No class (FOCS).

Lecture 20: SoS Halloween Hackathon (Breaking the SoS daunting notation barrier of entry). As a group you completed 4 tasks. First, you recalled the pseudo-expectation formalism of the SoS hierarchy for the independence number (~10 minutes). Second, you recalled the (pseudo-)moment matrix formalism of the SoS hierarchy (~10 minutes). Third, you proved that the (pseudo-)moment matrix being PSD implies that the pseudo-expectation is non-negative on SoS polynomials (assuming its degree is at most D, so things are well-defined). Fourth, you gave a Sum-of-Squares proof of the widely used Cauchy-Schwarz inequality (this means that the SoS hierarchy "automatically" applies this inequality for you). You did not need to use any of your 4 yes/no questions to me, nor the one yes/no question to our pseudo-student. Finally, you quickly presented your findings to me and everything was correct! ;)

Lecture 21: We introduced the task of counting the number of edges across a given cut in a graph. Next, we phrased this task linear algebraically using the graph's adjacency matrix and cut functions/matrices. We also considered the task of approximately counting the edges across a given cut. We aimed to replace the (possibly) complex original adjacency matrix with a (possibly) much simpler approximating matrix. These two objects must be approximately indistinguishable with respect to cut functions so that we can use the latter to approximate cut sizes in the original graph. We introduced an important meta-principle of pseudorandomness that, to approximately fool a class of distinguishers (in this case, cut functions), it suffices to use an approximator that is slightly more complex than the distinguishers. We briefly mentioned Szemerédi regularity lemma, which is a fundamental result in graph theory. Then, we phrased the weak regularity lemma of Frieze and Kannan for graphs, establishing an approximator that is a linear combination of "constantly" (i.e., O(1/delta^2)) many cut functions. We generalized the statement of weak regularity to an abstract form that applies to Hilbert spaces and gave an elementary proof of this existential result. Finally, we briefly discussed using a weak regularity decomposition (with respect to cut functions) to provide an excellent approximation for MaxCut.

Lecture 22: We recalled the abstract weak regularity lemma and its proof from last class. Next, we proceeded to make it algorithmically efficient (for cut functions as distinguishers) since our existential weak regularity proof relies on an oracle call to find a "violating" distinguisher (and this may not even be computable). Toward this goal, we introduced the cut norm of a matrix. and we also stated the SDP-based approximating algorithm of Alon and Naor. We moved on to provide an efficient approximation algorithm for MaxCut (and implicitly to other cut problems) on dense graphs. Flipped classroom: you were asked to design this algorithm, and you were allowed to invoke the Alon-Naor algorithm as a subroutine. (Sampling-based ideas, as some of you hinted, also work in this dense regime, but we continued with our SDP-based strategy.). A crucial observation is that the weak regularity decomposition uses only O(1/delta^2) cut functions, where delta is the (normalized) approximation error. Using these few cut functions, we showed an exp(exp(poly(1/delta))) poly(n) time approximation algorithm for MaxCut (it is possible to improve this running time. Can you see how?).

Lecture 23: We started by discussing 2-CSPs and how the structure of the underlying constraint graph influences how well they can be approximated. We considered some extreme cases: planar graphs (more generality hyperfinite graphs) and expanders (more generality low-threshold rank graphs, which include dense graphs). Efficient approximation algorithms are known for various parameter regimes of these extreme cases. Next, we introduced general k-CSPs. We introduced Max 3-SAT and gave a very simple algorithm that gives a 7/8-approximation. We introduced the proof verification class PCP[r,q], where 'r' is the number of random bits and 'q' is the number of queries. First, we stated the celebrated PCP theorem for NP in terms of proof verification, namely, NP is contained in PCP[O(log(n)),O(1)]. Second, we stated the PCP theorem for NP in terms of the hardness of approximation of k-CSPs. Third, we stated the PCP theorem for NP in terms of multi-prover interactive systems. We proved (various directions of) the equivalence of these three versions of the classical PCP theorem. We also discussed that NEXP (an exponential scaled-up version of NP) is contained in PCP[O(poly(n)),O(1)].

Lecture 24: We recalled that PCP theorem for NP and their equivalent views from last class. Next, we introduced the k-local Hamiltonian (k-LH) problem, which is a quantum analog of classical k-CSPs. We stated Kitaev's quantum Cook-Levin theorem, establishing that k-LH is QMA-complete. We introduced the k-LH (quantum CSP) version of the quantum PCP Conjecture and briefly discussed the other two versions based on proof verification and multi-prover systems and how the connection among them is not as simple as in the classical case. Next, we switched gears back to the classical setting and proceeded to prove an exponential-sized PCP for NP, namely, NP is contained in PCP[O(poly(n)), O(1)]. For this, we started with the quadratic equations problems over F_2 (QuadEq) and showed how to make a robust version of an assignment to the system of equations (a much more robust proof), allowing for verifications using O(1) queries. As a warm-up, we discussed a simple sum-check protocol to test the equality of two vectors over F_2 probabilistically. This naturally leads to a way of checking if the input system of equations is satisfied assuming the assignment is 'y=x tensor x', and to the Hadamard encoding of the assignment. We saw how the BRL test (studied in class before) can be used to check closeness to a true codeword. We also established that the Hadamard code is 2-query locally correctable code (2-LCC), and used this to recover (w.h.p) arbitrary positions in a corrupted encoding of the assignment. Next, we asked for the Hadamard encoding of 'x' as part of our robust proof, so that we can check that 'y' is actually of the form 'x tensor x'. We again apply a sum-check strategy to check this property and conclude the proof. Finally, we discussed that this exponential PCP can be used effectively in some PCP composition settings (when the problem to be verified has constant or small input size).

Lecture 25: We started sketching Dinur's proof of the PCP theorem for NP via gap amplification. We first recalled the CSP version of the PCP theorem. We stated the gap amplification theorem and deduced the PCP theorem from it. We showed that the gap amplifications is comprised of four steps: (i) degree reduction of the constraint graph, (ii) "expanderization" of the constraint graph, (iii) powering of the CSP (analogous to graph powering), (iv) alphabet reduction via PCP(P) composition. For steps i, ii and iv, there is a constant loss in the gap, which is compensated by the powering step (provided the powering parameter is sufficiently large). We deduced the gap amplification theorem from these four steps. Next, we sketched the proofs of steps (i) and (ii). Finally, we described in more detail the powering step.

Lecture 26: We recalled the gap amplification theorem and its four steps from last class. We then proceed to give a proof sketch of the powering step of Dinur (with the improved parameters of Radhakrishna). We concluded with the alphabet reduction of step (iv) via a PCP(P) composition leveraging ideas from our baby PCP. Next, we switched gears to discuss some open problems about PCP. We mention the dream goal of having a linear size PCP (vaguely analogous to having a good code). We mentioned desired of better understanding the PCP theorem by having a more direct one-shot construction. Next, we introduced Khot's Unique Games Conjecture and briefly mentioned the Arora-Barak-Steurer subexponential algorithm, as well, the recent proof of the 2-to-2 conjecture.

Lecture 27: Fall break.

Lecture 28: Fall break.

Lecture 29: We surveyed some of the frontiers and open questions in coding theory, mostly about the elusive case of binary codes. We recalled the existential 1950's GV bound from previous lectures. We discussed the 1970's MRRW impossibility bound via linear programming. These two bound are still the state-of-art bounds for binary codes in the important good code regime. We discussed the substantial gap between these two bounds and how this is an important longstanding open problem in coding theory. We discussed Ta-Shma's breakthrough construction of epsilon-balanced codes near the GV bound with its key ingredient: a "higher-order" Zig-Zag. We also recalled how Zig-Zag has influenced other important results such as Reingold SL=L and Dinur's PCP by gap amplification. We mentioned that the Sum-of-Squares SDP hierarchy (studied in class) was used to give the first efficient decoding algorithm for Ta-Shma's codes. We mentioned that weak regularity (of sparse expanding tensors) was used to give near-linear time decoding of Ta-Shma's codes. We briefly discussed the role of the alphabet size and how increasing the alphabet can substantially change the nature of the coding theory problem. We briefly mentioned some large alphabet codes, recalled the Singleton bound, and mentioned the generalized Singleton bound.

Grading

This is an advanced graduate topics course, so the primary goal is to develop your research maturity and increase your knowledge and skills. Grades will be a secondary concern. We do expect you to seriously attack your chosen research project, but there is no requirement of a tangible outcome.

Policy

Project: 80% (updated details!)

This will consist in selecting a research problem or paper related to the course in consultation with the instructor. Students can either work in pairs or individually. The outcome should be a short research paper on the chosen topic (this can range from a survey to an original result ;), and a presentation towards the end of the course. In summary, a project will consist of the following:
- Selection of the research topic: consult the instructor to find a problem or paper aligned with your interests and the scope of the course. This can occur at any point (ideally, as we transition from phase 1 to 2).
- Mid-project feedback: collect the instructor's feedback on the progress of the project at approximately 1 month before the end of the course.
- Project supervision: you are welcome and encouraged to interact with me once a week for ~30 minutes. To be sure you find me, please schedule office hours via email. Otherwise, I should be tentatively mostly around on Thursdays and Fridays afternoons. My (temporary) office is 4336 at Siebel Center.
- Outcomes: research paper (~10 pages) + presentation towards the end of the course (~15-20 minutes per project).

Participation: 20%

This includes attendance, curiosity, asking questions during class or office hours, etc.

CS 598: Expansion, Codes, and Optimization both Classical and Quantum

Course Description