# Theoretical CS Reading Group

The reading group meets once a week
usually, Wednesday 1:00-2:00pm to discuss
a particular paper or topic.
Please join our mailing list to get notifications about the weekly meetings and discussions. You might also be interested in the Michigan-Purdue Theory Seminar Fall 2020. Upcoming talks will be listed on this webpage regularly. If you wish to give a talk or present a paper please send a mail to zhu628 at purdue.edu |

Spring 2019
Fall 2018 Spring 2018 Fall 2017 Spring 2017 Fall 2016 Spring & Summer 2016 Fall 2015 Spring & Summer 2015 Fall 2014 Spring 2014 Summer & Fall 2013 |

# Fall 2020

** Wednesday, 16th September 2020** —
Karthekeyan Chandrasekaran, Elena Grigorescu, Gabriel Istrate, Shubhang Kulkarni, Young-San Lin, Minshen Zhu

The Maximum Binary Tree Problem

**Presenter:**Young-San Lin

**Abstract**

We introduce and investigate the approximability of the maximum binary tree problem (MBT) in directed and undirected graphs. The goal in MBT is to find a maximum-sized binary tree in a given graph. MBT is a natural variant of the well-studied longest path problem, since both can be viewed as finding a maximum-sized tree of bounded degree in a given graph. The connection to longest path motivates the study of MBT in directed acyclic graphs (DAGs), since the longest path problem is solvable efficiently in DAGs. In contrast, we show that MBT in DAGs is in fact hard: it has no efficient exp(-O(logn/loglog n))-approximation algorithm under the exponential time hypothesis, where n is the number of vertices in the input graph. In undirected graphs, we show that MBT has no efficient exp(-O(log^{0.63}n))-approximation under the exponential time hypothesis. Our inapproximability results rely on self-improving reductions and structural properties of binary trees. We also show constant-factor inapproximability assuming P != NP. In addition to inapproximability results, we present algorithmic results along two different flavors: (1) We design a randomized algorithm to verify if a given directed graph on n vertices contains a binary tree of size k in 2^k poly(n) time. (2) Motivated by the longest heapable subsequence problem, introduced by Byers, Heeringa, Mitzenmacher, and Zervas (ANALCO 2011), which is equivalent to MBT in permutation DAGs, we design efficient algorithms for MBT in bipartite permutation graphs.

** Wednesday, 23rd September 2020** —
Anup Rao

Coding for Sunflowers

**Presenter:**Minshen Zhu

**Abstract**

A p-sunflower is a family of p sets whose pairwise intersections are identical. An interesting problem in combinatorics is to determine how small a set system could be while necessarily containing a p-sunflower. Erd\"os and Rado proved that one can always find a p-sunflower among k!(p-1)^k sets of size k (a.k.a. the sunflower lemma). On the other hand, there is a family of (p-1)^k sets of size k that does not contain a sunflower. The sunflower conjecture asserts that the correct bound for this problem is c^k, where c is a constant which depends only on p. In a recent breakthrough, Alweiss, Lovett, Wu and Zhang improved the bound to (logk)^k (p loglogk)^O(k). Soon after, the proof is simplified by Anup Rao and the bound is sharpened to (ap log(pk))^k (for some universal constant a). See also Terrence Tao's blog for an information-theoretic formulation of Anup Rao's proof. Instead of searching directly for a sunflower, both proofs go through a robust notion of sunflowers which may be of independent interest. The (improved) sunflower lemma is useful in many contexts, including lower bounds for data structures maintaining set statistics, lower bounds for monotone circuits and analysis of the Karger-Stein Algorithm for k-cut.

**Wednesday, 7th October 2020**— Maria-Florina Balcan, Avrim Blum, Santosh Vempala

A Discriminative Framework for Clustering via Similarity Functions

**Presenter:**Tamalika Mukherjee

**Abstract**

Problems of clustering data from pairwise similarity information are ubiquitous in Computer Science. Theoretical treatments typically view the similarity information as ground-truth and then design algorithms to (approximately) optimize various graph-based objective functions. However, in most applications, this similarity information is merely based on some heuristic; the ground truth is really the unknown correct clustering of the data points and the real goal is to achieve low error on the data. In this work, we develop a theoretical approach to clustering from this perspective. In particular, motivated by recent work in learning theory that asks “what natural properties of a similarity (or kernel) function are sufficient to be able to learn well?” we ask “what natural properties of a similarity function are sufficient to be able to cluster well?” To study this question we develop a theoretical framework that can be viewed as an analog of the PAC learning model for clustering, where the object of study, rather than being a concept class, is a class of (concept, similarity function) pairs, or equivalently, a property the similarity function should satisfy with respect to the ground truth clustering. We then analyze both algorithmic and information-theoretic issues in our model. While quite strong properties are needed if the goal is to produce a single approximately correct clustering, we find that a number of reasonable properties are sufficient under two natural relaxations: (a) list clustering: analogous to the notion of list-decoding, the algorithm can produce a small list of clusterings (which a user can select from) and (b) hierarchical clustering: the algorithm’s goal is to produce a hierarchy such that desired clustering is some pruning of this tree (which a user could navigate). We develop a notion of the clustering complexity of a given property (analogous to notions of capacity in learning theory), which characterizes its information-theoretic usefulness for clustering. We analyze this quantity for several natural game-theoretic and learning-theoretic properties, as well as design new efficient algorithms that are able to take advantage of them. Our algorithms for hierarchical clustering combine recent learning theoretic approaches with linkage-style methods. We also show how our algorithms can be extended to the inductive case, i.e., by using just a constant-sized sample, as in property testing.

**Wednesday, 14th October 2020**— Daniel Lokshtanov, Pranabendu Misra, Joydeep Mukherjee, Fahad Panolan, Geevarghese Philip, Saket Saurabhk

2-Approximating Feedback Vertex Set in Tournaments

**Presenter:**Shubhang Kulkarni

**Abstract**

A tournament is a directed graph T such that every pair of vertices is connected by an arc. A feedback vertex set is a set S of vertices in T such that T − S is acyclic. We consider the Feedback Vertex Set problem in tournaments. Here the input is a tournament T and a weight function w : V (T ) → N and the task is to find a feedback vertex set S in T minimizing w(S) = ∑ _{v∈S} w(v). Rounding optimal solutions to the natural LP-relaxation of this problem yields a simple 3-approximation algorithm. This has been improved to 2.5 by Cai et al. [SICOMP 2000], and subsequently to 7/3 by Mnich et al. [ESA 2016]. In this paper we give the first polynomial time factor 2 approximation algorithm for this problem. Assuming the Unique Games conjecture, this is the best possible approximation ratio achievable in polynomial time.

**Wednesday, 21st October 2020**— Alina Ene, Deeparnab Chakrabarty, Ravishankar Krishnaswamy, and Debmalya Panigrahi

Online Buy-at-Bulk Network Design

**Presenter:**Young-San Lin

**Abstract**

We present the first non-trivial online algorithms for the non-uniform, multicommodity buy-at-bulk (MC-BB) network design problem. Our competitive ratios qualitatively match the best known approximation factors for the corresponding offline problems. In particular, we show 1) A polynomial time online algorithm with a poly-logarithmic competitive ratio for the MC-BB problem in undirected edge-weighted graphs. 2) A quasi-polynomial time online algorithm with a poly-logarithmic competitive ratio for the MC-BB problem in undirected node-weighted graphs. 3) For any fixed $\epsilon > 0$, a polynomial time online algorithm with a competitive ratio of $\tilde{O}(k^{1/2+\epsilon})$ (where k is the number of demands, and $\tilde{O}(.)$ hides polylog factors) for MC-BB in directed graphs. 4) Algorithms with matching competitive ratios for the prize-collecting variants of all the above problems. Prior to our work, a logarithmic competitive ratio was known for undirected, edge-weighted graphs only for the special case of uniform costs (Awerbuch and Azar, FOCS 1997), and a polylogarithmic competitive ratio was known for the edge-weighted single-sink problem (Meyerson, SPAA 2004). To the best of our knowledge, no previous online algorithm was known, even for uniform costs, in the node-weighted and directed settings. Our main engine for the results above is an online reduction theorem of MC-BB problems to their single-sink (SS-BB) counterparts. We use the concept of junction-tree solutions (Chekuri et al., FOCS 2006) that play an important role in solving the offline versions of the problem via a greedy subroutine – an inherently offline procedure. Our main technical contribution is in designing an online algorithm using only the existence of good junction-trees to reduce an MC-BB instance to multiple SS-BB sub-instances. Along the way, we also give the first non-trivial online node-weighted/directed single-sink buy-at-bulk algorithms. In addition to the new results, our generic reduction also yields new proofs of recent results for the online node-weighted Steiner forest and online group Steiner forest problems.

**Wednesday, 28th October 2020**— Saurabh Sawlani, Junxing Wang

Near-Optimal Fully Dynamic Densest Subgraph

**Presenter:**Himanshi Mehta

**Abstract**

We give the first fully dynamic algorithm which maintains a (1−e)-approximate densest subgraph in worst-case time poly(logn, 1/e) per update. Dense subgraph discovery is an important primitive for many real-world applications such as community detection, link spam detection, distance query indexing, and computational biology. We approach the densest subgraph problem by framing its dual as a graph orientation problem, which we solve using an augmenting path-like adjustment technique. Our result improves upon the previous best approximation factor of (1/4−e) for fully dynamic densest subgraph [Bhattacharya et. al., STOC ‘15]. We also extend our techniques to solving the problem on vertex-weighted graphs with similar runtimes. Additionally, we reduce the (1−e)-approximate densest subgraph problem on directed graphs to O(logn/e) instances of (1−e)-approximate densest subgraph on vertex-weighted graphs. This reduction, together with our algorithm for vertex-weighted graphs, gives the first fullydynamic algorithm for directed densest subgraph in worst-case time poly(logn, 1/e) per update. Moreover, combined with a near-linear time algorithm for densest subgraph [Bahmani et. al., WAW ‘14], this gives the first near-linear time algorithm for directed densest subgraph.

**Wednesday, 4th November 2020**— TBD

TBD

**Presenter:**TBD

**Abstract**