Algorithmic Challenges of Big Data (ACBD) 2022

PRELIMINARY PROGRAM

Day 1 – 9th of May 2022

[table id=4 /]

Day 2 – 10th of May 2022

[table id=5 /]

 

Talks


 

Title: A PTAS for Unsplittable Flow on a Path
Speaker: Fabrizio Grandoni
Abstract: In the Unsplittable Flow on a Path problem (UFP) we are given a path with edge capacities, and a set of tasks where each task is characterized by a subpath, a demand, and a weight. The goal is to select a subset of tasks of maximum total weight such that the total demand of the selected tasks using each edge e is at most the capacity of e. The problem admits a QPTAS [Bansal, Chakrabarti, Epstein, Schieber, STOC’06; Batra, Garg, Kumar, Mömke, Wiese, SODA’15]. After a long sequence of improvements [Bansal, Friggstad, Khandekar, Salavatipour, SODA’09; Bonsma, Schulz, Wiese, FOCS’11; Anagnostopoulos, Grandoni, Leonardi, Wiese, SODA’14; Grandoni, Mömke, Wiese, Zhou, STOC’18], the best known polynomial time approximation algorithm for UFP has an approximation ratio of 1 + 1/(e+1) + ε < 1.269 [Grandoni, Mömke, Wiese, SODA’22]. It has been an open question whether this problem admits a PTAS. In this paper, we solve this open question and present a polynomial time (1 + ε)-approximation algorithm for UFP.


 

Title: Deterministic Rounding of Dynamic Fractional Matchings
Speaker: Sayan Bhattacharya
Abstract: We present a framework for deterministically rounding a dynamic fractional matching. Applying our framework in a black-box manner on top of existing fractional matching algorithms, we derive the following new results: (1) The first deterministic algorithm for maintaining a $(2-\delta)$-approximate maximum matching in a fully dynamic bipartite graph, in arbitrarily small polynomial update time. (2) The first deterministic algorithm for maintaining a $(1+\delta)$-approximate maximum matching in a decremental bipartite graph, in polylogarithmic update time. (3) The first deterministic algorithm for maintaining a $(2+\delta)$-approximate maximum matching in a fully dynamic general graph, in small polylogarithmic (specifically, $O(\log^4 n)$) update time. These results are respectively obtained by applying our framework on top of the fractional matching algorithms of Bhattacharya et al. [STOC’16], Bernstein et al. [FOCS’20], and Bhattacharya and Kulkarni [SODA’19].

Prior to our work, there were two known general-purpose rounding schemes for dynamic fractional matchings. Both these schemes, by Arar et al. [ICALP’18] and Wajc [STOC’20], were randomized.

Our rounding scheme works by maintaining a good matching-sparsifier with bounded arboricity, and then applying the algorithm of Peleg and Solomon [SODA’16] to maintain a near-optimal matching in this low arboricity graph. To the best of our knowledge, this is the first dynamic matching algorithm that works on general graphs by using an algorithm for low-arboricity graphs as a black-box subroutine. This feature of our rounding scheme might be of independent interest.

This is joint work with Peter Kiss.


 

Title: Search Trees on Trees
Speaker: Laszlo Kozma
Abstract: “Search trees on graphs” (STGs) and “search trees on trees” (STTs) generalize binary search trees (BSTs) and allow the exploration of graph- or tree-shaped search spaces. They have been studied under various guises in computer science and combinatorics (closely related concepts include vertex rankings, elimination trees, ordered colorings, treedepth, and graph associahedra). For BSTs a rich theory of adaptivity has been developed in the past decades: an optimal tree for a given search distribution can be efficiently computed [Knuth, ’71], and dynamically updated trees, e.g. splay trees, have adaptive properties even in an online setting [Sleator, Tarjan, ’83]. It is natural to ask to what extent this theory extends to the more general case of STTs.


 

Title: Almost-Linear Time Algorithms for Maximum Flow and More
Speaker: Rasmus Kyng
Abstract: We give the first almost-linear time algorithm for computing exact maximum flows and minimum-cost flows on directed graphs. By well-known reductions, this implies almost-linear time algorithms for several problems including bipartite matching, optimal transport, and undirected vertex connectivity.

Our algorithm uses a new Interior Point Method (IPM) that builds the optimal flow as a sequence of an almost-linear number of approximate undirected minimum-ratio cycles, each of which is computed and processed very efficiently using a new dynamic data structure.

Our framework extends to give an almost-linear time algorithm for computing flows that minimize general edge-separable convex functions to high accuracy. This gives the first almost-linear time algorithm for several problems including entropy-regularized optimal transport, matrix scaling, p-norm flows, and Isotonic regression.

Joint work with Li Chen, Yang Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva.


 

Title: Online Routing and Network Design with Predictions
Speaker: Nicole Megow
Abstract: We initiate the study of learning-augmented algorithms for online routing problems such as the Online Traveling Salesperson Problem and the Online Dial-a-Ride Problem, where (transportation) requests arrive over time as locations in a metric space and a server has to be moved to serve all requests. We propose a novel prediction model and a general framework for deriving learning-augmented algorithms using known online algorithms as a “black box”. Thereby, we improve upon known worst-case barriers, when given quite accurate predictions, at the cost of slightly increased worst-case bounds when given predictions of arbitrary quality.

As a key contribution, we introduce a novel error measure for input predictions based on the minimum cost hyperedge cover in a suitably defined graph. It generalizes previously defined error measures and, most importantly, it captures errors in (i) the location of predicted requests, ii) the release dates of predicted requests and (iii) the error due to not predicted requests. While metric problems have been studied before, this seems to be the first error measure capturing a timing component. Further, it captures errors in the location better than previously proposed measure. By defining the edge cost for the underlying edge cover problem properly, this error measure can be applied to arbitrary metric problems with or without timing aspect. In particular, we achieve refined results for previously studied network design problems such as online Steiner tree and and facility location.

Joint work with Giulia Bernardini, Alexander Lindermayr, Alberto Marchetti-Spaccamela, Leen Stougie, and Michelle Sweering.


 

Title: Capacitated Vehicle Routing
Speaker: Hang Zhou
Abstract: We study the capacitated vehicle routing problem in the Euclidean space and on trees. This is joint work with Claire Mathieu.


 

Title: Min-sum Clustering with Outliers
Speaker: Sandip Banerjee
Abstract: We give a constant factor polynomial time pseudo-approximation algorithm for min-sum
clustering with or without outliers. The algorithm is allowed to exclude an arbitrarily small constant fraction of the points. For instance, we show how to compute a solution that clusters 98\% of the input data points and pays no more than a constant factor times the optimal solution that clusters 99\% of the input data points. More generally, we give the following bicriteria approximation: For any $\eps > 0$, for any instance with $n$ input points and for any positive integer $n’\le n$, we compute in polynomial time a clustering of at least $(1-\eps) n’$ points of cost at most a constant factor greater than the optimal cost of clustering $n’$ points. The approximation guarantee grows with $\frac{1}{\eps}$. Our results apply to instances of points in real space endowed with squared Euclidean distance,
as well as to points in a metric space, where the number of clusters, and also the dimension if relevant, is arbitrary (part of the input, not an absolute constant).


 

Title: How to price ride-hailing fares using submodularity
Speaker: Marek Adamczyk
Abstract: We are going to show a submodularity based approach for pricing ride-hailing fares which gives (1-1/e) guarantee of approximation, beating the 1/3-approximation of Hikima et al. (AAAI 2021) who used min-cost flow methods.


 

Title: Interplay between Graph Isomorphism and Earth Mover’s Distance in the Query and Communication Worlds
Speaker: Gopinath Mishra
Abstract: The graph isomorphism distance between two graphs $G_u$ and $G_k$ is the fraction of entries in the adjacency matrix that has to be changed to make $G_u$ isomorphic to $G_k$. We study the problem of estimating, up to a constant additive factor, the graph isomorphism distance between two graphs in the query model. In other words, if $G_k$ is a known graph and $G_u$ is an unknown graph whose adjacency matrix has to be accessed by querying the entries, what is the query complexity for testing whether the graph isomorphism distance between $G_u$ and $G_k$ is less than $\gamma_1$ or more than $\gamma_2$, where $\gamma_1$ and $\gamma_2$ are two constants with $0\leq \gamma_1 < \gamma_2 \leq 1$. It is also called the tolerant property testing of graph isomorphism in the dense graph model. The non-tolerant version (where $\gamma_1$ is $0$) has been studied by Fischer and Matsliah~(SICOMP’08).

In this paper, we prove a (interesting) connection between tolerant graph isomorphism testing and tolerant testing of the well studied Earth Mover’s Distance (EMD). We prove that deciding tolerant graph isomorphism is equivalent to deciding tolerant EMD testing between multi-sets in the query setting. Moreover, the reductions between tolerant graph isomorphism and tolerant EMD testing (in query setting) can also be extended directly to work in the two party Alice-Bob communication model (where Alice and Bob have one graph each and they want to solve tolerant graph isomorphism problem by communicating bits), and possibly in other sublinear models as well.

Testing tolerant EMD between two probability distributions is equivalent to testing EMD between two multi-sets, where the multiplicity of each element is taken appropriately, and we sample elements from the unknown multi-set \textbf{with} replacement. In this paper, our (main) contribution is to introduce the problem of \emph{(tolerant) EMD testing between multi-sets (over Hamming cube) when we get samples from the unknown multi-set \textbf{without} replacement} and to show that \emph{this variant of tolerant testing of EMD is as hard as tolerant testing of graph isomorphism between two graphs}. {Thus, while testing of equivalence between distributions is at the heart of the non-tolerant testing of graph isomorphism, we are showing that the estimation of the EMD over a Hamming cube (when we are allowed to sample \textbf{without} replacement) is at the heart of
tolerant graph isomorphism.} We believe that the introduction of the problem of testing EMD between multi-sets (when we get samples \textbf{without} replacement) opens an entirely new direction in the world of testing properties of distributions.


 

 

Title: Fast Deterministic Fully Dynamic Distance Approximation
Speaker: Sebastian Forster
Abstract: We develop deterministic fully dynamic algorithms for computing approximate distances in a graph with worst-case update time guarantees. In particular we obtain improved dynamic algorithms that, given an unweighted and undirected graph G = (V, E) undergoing edge insertions and deletions, and a parameter 0 < eps ≤ 1, maintain (1 + eps)-approximations of the st distance of a single pair of nodes, the distances from a single source to all nodes (“SSSP”), the distances from multiple sources to all nodes (“MSSP”), or the distances between all nodes (“APSP”). Joint work with Jan van den Brand and Yasamin Nazari; preprint: https://arxiv.org/abs/2111.03361


 

 

Title: Directed flow-augmentation
Speaker: Marcin Pilipczuk
Abstract: We show a flow-augmentation algorithm in directed graphs: There exists a polynomial-time algorithm that, given a directed graph G, two integers s,t in V(G), and an integer k, adds (randomly) to G a number of arcs such that for every minimal st-cut Z in G of size at most k, with probability 2^(-poly(k)) the set Z becomes a minimum st-cut in the resulting graph.
The directed flow-augmentation tool allows us to prove fixed-parameter tractability of a number of problems parameterized by the cardinality of the deletion set, whose parameterized complexity status was repeatedly posed as open problems: (1) Chain SAT, defined by Chitnis, Egri, and Marx [ESA’13, Algorithmica’17], (2) a number of weighted variants of classic directed cut problems, such as Weighted st-Cut, Weighted Directed Feedback Vertex Set, or Weighted Almost 2-SAT.
By proving that Chain SAT is FPT, we confirm a conjecture of Chitnis, Egri, and Marx that, for any graph H, if the List H-Coloring problem is polynomial-time solvable, then the corresponding vertex-deletion problem is fixed-parameter tractable.


 

 

Title: Improved Approximation Guarantees for Shortest Superstrings
Speaker: Matthias Englert
Abstract: In the Shortest Superstring problem, we are given a set of strings and we are asking for a common superstring, which has the minimum number of characters. The Shortest Superstring problem is NP-hard and several constant-factor approximation algorithms are known for it. Of particular interest is the GREEDY algorithm, which repeatedly merges two strings of maximum overlap until a single string remains. Tarhio and Ukkonen (TCS 1988) conjectured that GREEDY gives a 2-approximation. In a seminal work, Blum, Jiang, Li, Tromp, and Yannakakis (STOC 1991) proved that the superstring computed by GREEDY is a 4-approximation, and this upper bound was improved to 3.5 by Kaplan and Shafrir (IPL 2005).
We show that the approximation guarantee of GREEDY is at most ≈3.425. Furthermore, we prove that the Shortest Superstring can be approximated within a factor of 2.475. Joint work with Nicolaos Matsakis, Pavel Veselý.


 

 

Title: Online Virtual Machine Allocation with Predictions
Speaker: Seffi Naor
Abstract: The cloud computing industry has grown rapidly over the last decade, and with this growth there is a significant increase in demand for compute resources. Demand is manifested in the form of Virtual Machine (VM) requests, which need to be assigned to physical machines in a way that minimizes resource fragmentation and efficiently utilizes the available machines. This problem can be modeled as a dynamic version of the bin packing problem with the objective of minimizing the total usage time of the bins (physical machines). Earlier works on dynamic bin packing assumed that no knowledge is available to the scheduler and later works studied models in which lifetime/duration of each “item” (VM in our context) is available to the scheduler. This extra information was shown to improve exponentially the achievable competitive ratio.

Motivated by advances in Machine Learning that provide good estimates of workload characteristics, we study the effect of having extra information regarding future (total) demand. In the cloud context, since demand is an aggregate over many VM requests, it can be predicted with high accuracy (e.g., using historical data). We show that the competitive factor can be dramatically improved by using this additional information; in some cases, we achieve constant competitiveness, or even a competitive factor that approaches 1. Along the way, we design new offline algorithms with improved approximation ratios for the dynamic bin-packing problem.

Joint work with: Niv Buchbinder, Yaron Fairstein, Konstantina Mellou, and Ishai Menache.


 

 

Title: Factorial Lower Bounds for (Almost) Random Order Streams
Speaker: Michael Kapralov
Abstract: We introduce and study the StreamingCycles problem, a random order streaming version of the Boolean Hidden Hypermatching problem that has been instrumental in streaming lower bounds over the past decade. In this problem the edges of a graph G, comprising n/ℓ disjoint length-ℓ cycles on n vertices, are partitioned randomly among n players. Every edge is annotated with an independent uniformly random bit, and the players’ task is to output the parity of some cycle in G after one round of sequential communication. Our main result is an ℓ^Ω(ℓ) lower bound on the communication complexity of StreamingCycles, which is tight up to constant factors in ℓ. Applications of our lower bound for StreamingCycles include an essentially tight lower bound for component collection in (almost) random order graph streams, making progress towards a conjecture of Peng and Sohler [SODA’18] and the first exponential space lower bounds for random walk generation.


 

 

Title: Streaming Algorithms for Geometric Steiner Forest
Speaker: Robert Krauthgamer
Abstract: I will discuss the Steiner forest problem in the Euclidean plane, where the input is a multiset of points, partitioned into k color classes, and the goal is to find a minimum-cost Euclidean graph G such that every color class is connected. We study this problem in dynamic streams, where the input is provided by a stream of insertions and deletions of colored points from the discrete grid [\Delta]^2. Our main result is a single-pass streaming algorithm that uses poly(k \log\Delta) space and time, and estimates the cost of an optimal Steiner forest solution within ratio arbitrarily close to the famous Euclidean Steiner ratio $\alpha_2$ (currently $1.1547 \leq \alpha_2 \leq 1.214$). Our approach relies on a novel combination of streaming techniques, like sampling and linear sketching, with the classical dynamic-programming framework for geometric optimization problems, which usually requires large memory and has so far not been applied in the streaming setting. I will also discuss possible directions for future work.


 

 

Title: A Rigorous Approach to Design Learned Data Structures
Speaker: Giorgio Vinciguerra
Abstract: A newly proposed paradigm in data structures design consists in integrating machine-learning components that allow faster queries and more succinct (possibly compressed) space usage than traditional approaches thanks to a better adaptation to the regularities present in the input data.
Initial research, particularly focusing on data indexing, has been largely empirical and heuristic, and it has been met with enthusiasm by software engineers but with mild interest among theoreticians.
In this talk, we discuss some recent results for the ubiquitous problems of data compression and indexing that bring formality to this novel learning-based paradigm. We show the existence of learning-based data structures that achieve, at the same time, improved practical efficiency over known engineered solutions (sometimes of orders of magnitudes) and provably good theoretical worst-case guarantees.

Skip to content