ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC 
Project Details 
Funding Scheme :  General Research Fund  
Project Number :  17200214  
Project Title(English) :  Oblivious Matching Algorithms on Unknown (Hyper) Graphs  
Project Title(Chinese) :  未知(超)圖上的無識匹配算法  
Principal Investigator(English) :  Dr Chan, Hubert Tsz Hong  
Principal Investigator(Chinese) :  
Department :  Department of Computer Science  
Institution :  The University of Hong Kong  
Email Address :  hubert@cs.hku.hk  
Tel :  28578461  
Co  Investigator(s) : 


Panel :  Engineering  
Subject Area :  Computing Science & Information Technology  
Exercise Year :  2014 / 15  
Fund Approved :  875,000  
Project Status :  Completed  
Completion Date :  31122017  
Project Objectives : 


Abstract as per original application (English/Chinese): 
This project will investigate the classical maximum matching problem of finding a maximum subset of nonintersecting edges, in the case where the graph structure is unknown. The existence of an edge between two nodes remains unknown until the pair is probed; when an edge is found, the corresponding pair must be matched up and removed from the graph. Our goal is to investigate various versions of the problem, and design algorithms to maximize the size or the weight of the matching. The project will bring about technical breakthroughs, and the problem has realworld applications such as online advertising, online dating, and the kidney exchange problem.
More specifically, in the oblivious matching problem, the graph is fixed in advance but unknown to the algorithm. The algorithm knows only the set of nodes, and specifies an order of node pairs to probe the graph. The performance ratio is the size of the matching produced to that of a maximum matching. A ratio of 0.5 can be achieved trivially and is tight for deterministic algorithms. Since the mid1990s, theoreticians such as Fulkerson Prize winners Dyer and Frieze,
and twotime Goedel Prize winner Szegedy used sophisticated arguments to prove that several randomized algorithms can achieve ratios marginally (around 0.004) above 0.5. In largescale applications, this could mean a huge increase in utility.
We will develop novel mathematical techniques to produce significant results. In our preliminary work, we use sophisticated combinatorial analysis and mathematical programming techniques to achieve the currently best proven ratio of 0.523. This is a major breakthrough, and provides an encouraging start for the design of algorithms for more general versions of the problem in which the nodes or edges of the graph are weighted.
Our techniques also offer us important insights for investigating different versions of the problem on hypergraphs. For instance, the analogous result for 3uniform hypergraphs would be breaking the 1/3 barrier. This has important application to combinatorial auctions under settings in which complete information is unavailable.
In conclusion, there is no doubt that oblivious matching is an important problem for theoreticians and practitioners alike. The proposed project will bring new insights to the problem and the technological breakthroughs will have a significant impact on the wider theoretical community. 本項目將研究在未知圖結構情況下的經典最大匹配問題，即尋找最大不相交邊集合。一條邊存在與否只有當相應的點對被探測時才可獲知；當一條邊被探測並已知存在時，相應的點對必須被匹配並移出該圖。我們的目標是研究該問題的多個版本，並設計最大化匹配大小或權值的算法。本項目將帶來技術上的突破，並且該問題對在線廣告、在線約會及腎臟交換問題中具有實際應用。 具體地講，在該無識匹配問題中，所研究的圖已預先確定，但算法並不知道其結構。算法只知道該圖的點集合，並製定一個點對順序來探測該圖。性能比是指算法產生的匹配的大小與最大匹配大小之間的比值。簡單的算法即可獲得0.5的性能比，並且該數值對確定算法來說是緊緻的。從上世紀九十年代以來，包括Fulkerson獎獲得者Dyer和Frieze以及兩次哥德爾獎獲得者Szegedy在內的理論家，利用複雜的論證證明了某些隨機算法可以達到略微（幅度約0.004）大於0.5的性能比。在大規模應用中，這或許會帶來效用的巨大提升。 我們將開發全新的數學工具以獲得重大成果。在前期工作中，我們利用複雜的組合分析和數學規劃工具獲得了當前可被證明的最好性能比0.523。這是一個重大突破，並鼓勵我們開始針對帶權值點或帶權值邊等普適模型設計算法。 我們開發的工具也使我們對該問題在超圖上的各種版本有了重要的深刻的理解。例如，在3均勻超圖上的類似結果將打破性能比1/3的壁壘。該結果在無法獲知全部信息情況下的組合拍賣問題中具有重要應用價值。 

Realisation of objectives:  Objective 1: We have considered the oblivious matching problem on unweighted graphs, where an adversary commits to a simple undirected graph G=(V, E), and reveals the node set V to the randomized algorithm, while keeping the edges E secret (and hence the algorithm is initially oblivious to the edge set). The algorithm returns a list L that gives the order of probing unordered pairs of nodes. Matching is done in a greedy manner. When a pair of nodes are probed, if both are currently unmatched and an edge exists between them, then the two nodes will be matched together; otherwise, we skip to the next pair in the list L, until all pairs in L are probed. The goal is to maximize the performance ratio of the (expected) number of nodes matched by the algorithm to the number of nodes in a maximum matching in G. Our approach is to analyze the wellknown Ranking algorithm. The idea is to express the performance ratio has the objective of a linear program, and form linear constraints that capture the structure of the Ranking algorithm. For example, a node that receives higher rank in the algorithm is more likely to be matched; another property is that if a node is unmatched, then demoting its rank will not change the resulting matching. Hence, the minimization LP gives a lower bound on the performance ratio of the Ranking algorithm. Since the objective value of the LP is a decreasing function of the number of nodes, we need to consider the asymptotic behavior as n tends to infinity. Our insight is to replace the primal variables with a continuous primal function and consider the continuous variant of the LP. By analyzing the continuous LP and its dual, we show that the objective value of the LP is at least 0.523, which is our first theoretical guarantee on the performance of the Ranking algorithm on unweighted graphs. This result appears in the SIAM Journal of Computing 2018. We next further improve the analysis of the Ranking algorithm by exploring more structural properties. Another observation we made is that if in some instance, a matched node has two unmatched neighbors, then even if the node is demoted to have lowest rank, it will still be matched. This allows us to formulate new LP constraints. Furthermore, in our continuous LP framework, we allow the function to have jump discontinuities. Using these techniques, we have improved the analysis of the performance ratio to 0.526. This result appears in the ACM Transactions of Algorithms 2018. Based on the techniques we achieved for the oblivious problem, we consider similar problems on unweighted graphs in which the algorithm does not have complete information. In particular, we develop a dynamic data structure to maintain the hopdiameter of a tree with unweighted edges such that the increase of the degree of each node is constant. The result has been published in COCOON 2015. Objective 2: We have investigated the nodeweighted variant of the oblivious matching problem. In addition, the algorithm also receives the weight of each node, and the performance ratio is the expected sum of the weight of matched nodes to the sum of node weights in an optimal matching. By analyzing graceful instances in which both a node and its partner in the optimal matching are matched, we can formulate new LP constraints. Since the finite LP is characterized by the granularity of the adjustment function, the best possible performance ratio using this approach can be obtained by analyzing the corresponding continuous LP, which has value at least 0.5015. This result appears in the ACM Transactions of Algorithms 2018. We have also studied the edgeweighted variant, which incorporates the nodeweighted variant. Our idea is to partition the edges carefully according to the weights, and run the (unweighted) Ranking algorithm on each part. We analyze this approach using a new primaldual framework known as matching coverage, in which dual feasibility is bypassed. Instead, only dual constraints corresponding to edges in an optimal matching are satisfied. When the number of distinct edge weights is bounded, and our techniques can lead to a performance ratio better than 0.5. The result has been published in the European Symposium on Algorithms 2016. For edgeweighted directed graphs, we also considered spectral techniques to decompose the input graph such that each piece has small edge expansion. We achieve this by considering a stationary probability distribution of an augmented graph. We show a relationship between the spectral properties of the augmented graph and the original graph. The result has been published in COCOON 2015. Objective 3: Based on the linear program techniques we develop on the oblivious matching problem, we have devised efficient LPbased algorithms for locating densest subsets. In most such applications, one typically aims at finding several dense subgraphs which might correspond to communities in social networks or interesting events. We have defined and studied a natural generalization of the densest subgraph problem, where the goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. After showing that such a problem is NPHard, we have devised an efficient algorithm that comes with provable guarantees in some cases of interest, as well as, an efficient practical heuristic. Moreover, we have performed extensive evaluations on large realworld graphs, which confirm the efficiency of our algorithms. The results have been published in the ACM International Conference on Web Search and Data Mining (WSDM) 2015. We have extended the densest subgraph problem to hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. This has applications in combinatorial auctions, in which a densest subset corresponds to a group of desirable items. We have developed two exact algorithms and a nearlinear time rapproximation algorithm for the problem, where r is the maximum cardinality of an edge in the hypergraph. Furthermore, we have also considered the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We have designed two dynamic approximation algorithms with amortized polylogarithmic update times, for both the insertonly scenario and the fully dynamic scenario. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. The results have appeared in the ACM International Conference on Information and Knowledge Management (CIKM) 2017. Finally, we also considered other practical applications on hypergraphs such as semisupervised learning. We used the convex program approach whose objective function is not everywhere differentiable. Exploiting the nonuniqueness of the optimal solutions, we defined the concept of confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on realworld datasets have confirmed that our confidence interval approach on hypergraphs outperforms existing methods, and our subgradient method gives faster running times when the number of vertices is much larger than the number of edges. The results have appeared in ICML 2017.  
Summary of objectives addressed: 


Research Outcome  
Major findings and research outcome:  In this project, we give an improved analysis of the Ranking algorithm for the oblivious matching problem. For the unweighted case, we have given a novel LP formulation and analyzed its asymptotic behavior using continuous LP. Our improved results have appeared in the SICOMP 2018 paper “Ranking on Arbitrary Graphs: Rematch via Continuous Linear Programming” and TALG 2018 paper “Analyzing NodeWeighted Oblivious Matching Problem via Continuous LP with Jump Discontinuity”, the latter of which also gave the first analysis for the nodeweighted variant of the problem. We have also achieved a nontrivial performance ratio for the edgeweighted variant of the oblivious matching problem. When the number of distinct edge weights is bounded, and our techniques can lead to a performance ratio better than 0.5. The result has been published in the ESA 2016 paper “Beating Ratio 0.5 for Weighted Oblivious Matching Problems”. For edgeweighted directed graphs, we also considered spectral techniques to decompose the input graph such that each piece has small edge expansion. The result has been published in the COCOON 2015 paper “Cheeger Inequalities for General EdgeWeighted Directed Graphs”. We have also considered similar problems on unweighted graphs in which the algorithm does not have complete information. For instance, we have developed a dynamic data structure to maintain the hopdiameter of a tree with unweighted edges such that the increase of the degree of each node is constant. The result has been published in COCOON 2015 paper “Dynamic Tree Shortcut with Constant Degree”. Based on the LP techniques we develop on the oblivious matching problem, we have devised efficient LPbased algorithms for locating densest subsets. The results have been published in the WSDM 2015 paper “Finding Subgraphs with Maximum Total Density and Limited Overlap”. We have extended the densest subgraph problem to hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. The results have appeared in the CIKM 2017 paper “Maintaining Densest Subsets Efficiently in Evolving Hypergraphs”. Finally, we also considered other practical applications on hypergraphs such as semisupervised learning. Our experiments on realworld datasets have confirmed that our confidence interval approach on hypergraphs outperforms existing methods, and our subgradient method sometimes gives faster running times. The results have appeared in the ICML 2017 paper “Rerevisiting Learning on Hypergraphs: Confidence Interval and Subgradient Method”.  
Potential for further development of the research and the proposed course of action: 
We have developed the continuous LP framework to analyze the performance of the Ranking algorithm for the oblivious matching problem. The next step to use this technique to analyze other allocation problems. Moreover, the insights we have developed in this project can be applied to other problems whose inputs are only partially known. For instance, many classical problems have variants in which the inputs are revealed gradually. Towards the end of this project, we have considered some dynamic data structures and algorithms to tackle this setting, which is an interesting future research direction. Furthermore, while analyzing directed graphs and hypergraphs, we have observed some interesting relationships between clustering, graph partition and spectral properties of the underlying Laplacian operators. In future work, we plan to further explore clustering problems in hypergraphs and develop spectral theories to design relevant algorithms. As we have observed, these problems will have applications such as semisupervised learning.  
Layman's Summary of Completion Report:  In this project, we have investigated the classical maximum matching problem of finding a maximum subset of nonintersecting edges, in the case where the graph structure is unknown. We have studied various variants (unweighted, nodeweighted, edgeweighted) of the problem, and designed algorithms to maximize the size or the weight of the matching. Several technical breakthroughs have been achieved in this project, such as a novel continuous LP framework to analyze the asymptotic behavior of algorithms. Our results have realworld applications such as online advertising, online dating, and the kidney exchange problem. For instance, in the payperclick model, the cost for a click on some ad associated with a certain video is given, but whether a user will actually click on the ad is unknown. This corresponds to the nodeweighted or the edgeweighted versions of the problem, where the algorithm knows the node weights or the weight of a potential edge if it exists. We have developed new combinatorial techniques, and shown that performance ratios strictly better than the trivial 0.5 ratio can be achieved. In conclusion, we have brought new insights to the problems and the technological breakthroughs have significant impacts for both theoreticians and practitioners.  
Research Output  
Peerreviewed journal publication(s) arising directly from this research project : (* denotes the corresponding author) 


Recognized international conference(s) in which paper(s) related to this research project was/were delivered : 


Other impact (e.g. award of patents or prizes, collaboration with other research institutions, technology transfer, etc.): 
SCREEN ID: SCRRM00542 