ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC

Project Details

Funding Scheme :

General Research Fund

Project Number :

17200214

Project Title(English) :

Oblivious Matching Algorithms on Unknown (Hyper) Graphs

Project Title(Chinese) :

未知(超)圖上的無識匹配算法

Principal Investigator(English) :

Dr Chan, Hubert Tsz Hong

Principal Investigator(Chinese) :

Department :

School of Computing and Data Science

Institution :

The University of Hong Kong

E-mail Address :

hubert@cs.hku.hk

Tel :

28578461

Co - Investigator(s) :

Panel :

Engineering

Subject Area :

Computing Science & Information Technology

Exercise Year :

2014 / 15

Fund Approved :

875,000

Project Status :

Completed

Completion Date :

31-12-2017

Project Objectives :

To develop novel techniques for improving the performance ratio for the problem on unweighted graphs, and consider high-probability statements.

To design algorithms to break the 0.5 barrier for the node-weighted and the edge-weighted versions of the problem.

To investigate various versions of the problem on hypergraphs, which have applications in combinatorial auctions.

Abstract as per original application
(English/Chinese):

This project will investigate the classical maximum matching problem of finding a maximum subset of non-intersecting edges, in the case where the graph structure is unknown. The existence of an edge between two nodes remains unknown until the pair is probed; when an edge is found, the corresponding pair must be matched up and removed from the graph. Our goal is to investigate various versions of the problem, and design algorithms to maximize the size or the weight of the matching. The project will bring about technical breakthroughs, and the problem has real-world applications such as online advertising, online dating, and the kidney exchange problem. More specifically, in the oblivious matching problem, the graph is fixed in advance but unknown to the algorithm. The algorithm knows only the set of nodes, and specifies an order of node pairs to probe the graph. The performance ratio is the size of the matching produced to that of a maximum matching. A ratio of 0.5 can be achieved trivially and is tight for deterministic algorithms. Since the mid-1990s, theoreticians such as Fulkerson Prize winners Dyer and Frieze, and two-time Goedel Prize winner Szegedy used sophisticated arguments to prove that several randomized algorithms can achieve ratios marginally (around 0.004) above 0.5. In large-scale applications, this could mean a huge increase in utility. We will develop novel mathematical techniques to produce significant results. In our preliminary work, we use sophisticated combinatorial analysis and mathematical programming techniques to achieve the currently best proven ratio of 0.523. This is a major breakthrough, and provides an encouraging start for the design of algorithms for more general versions of the problem in which the nodes or edges of the graph are weighted. Our techniques also offer us important insights for investigating different versions of the problem on hypergraphs. For instance, the analogous result for 3-uniform hypergraphs would be breaking the 1/3 barrier. This has important application to combinatorial auctions under settings in which complete information is unavailable. In conclusion, there is no doubt that oblivious matching is an important problem for theoreticians and practitioners alike. The proposed project will bring new insights to the problem and the technological breakthroughs will have a significant impact on the wider theoretical community.

本項目將研究在未知圖結構情況下的經典最大匹配問題，即尋找最大不相交邊集合。一條邊存在與否只有當相應的點對被探測時才可獲知；當一條邊被探測並已知存在時，相應的點對必須被匹配並移出該圖。我們的目標是研究該問題的多個版本，並設計最大化匹配大小或權值的算法。本項目將帶來技術上的突破，並且該問題對在線廣告、在線約會及腎臟交換問題中具有實際應用。具體地講，在該無識匹配問題中，所研究的圖已預先確定，但算法並不知道其結構。算法只知道該圖的點集合，並製定一個點對順序來探測該圖。性能比是指算法產生的匹配的大小與最大匹配大小之間的比值。簡單的算法即可獲得0.5的性能比，並且該數值對確定算法來說是緊緻的。從上世紀九十年代以來，包括Fulkerson獎獲得者Dyer和Frieze以及兩次哥德爾獎獲得者Szegedy在內的理論家，利用複雜的論證證明了某些隨機算法可以達到略微（幅度約0.004）大於0.5的性能比。在大規模應用中，這或許會帶來效用的巨大提升。我們將開發全新的數學工具以獲得重大成果。在前期工作中，我們利用複雜的組合分析和數學規劃工具獲得了當前可被證明的最好性能比0.523。這是一個重大突破，並鼓勵我們開始針對帶權值點或帶權值邊等普適模型設計算法。我們開發的工具也使我們對該問題在超圖上的各種版本有了重要的深刻的理解。例如，在3-均勻超圖上的類似結果將打破性能比1/3的壁壘。該結果在無法獲知全部信息情況下的組合拍賣問題中具有重要應用價值。

Realisation of objectives:

Objective 1: We have considered the oblivious matching problem on unweighted graphs, where an adversary commits to a simple undirected graph G=(V, E), and reveals the node set V to the randomized algorithm, while keeping the edges E secret (and hence the algorithm is initially oblivious to the edge set). The algorithm returns a list L that gives the order of probing unordered pairs of nodes. Matching is done in a greedy manner. When a pair of nodes are probed, if both are currently unmatched and an edge exists between them, then the two nodes will be matched together; otherwise, we skip to the next pair in the list L, until all pairs in L are probed. The goal is to maximize the performance ratio of the (expected) number of nodes matched by the algorithm to the number of nodes in a maximum matching in G. Our approach is to analyze the well-known Ranking algorithm. The idea is to express the performance ratio has the objective of a linear program, and form linear constraints that capture the structure of the Ranking algorithm. For example, a node that receives higher rank in the algorithm is more likely to be matched; another property is that if a node is unmatched, then demoting its rank will not change the resulting matching. Hence, the minimization LP gives a lower bound on the performance ratio of the Ranking algorithm. Since the objective value of the LP is a decreasing function of the number of nodes, we need to consider the asymptotic behavior as n tends to infinity. Our insight is to replace the primal variables with a continuous primal function and consider the continuous variant of the LP. By analyzing the continuous LP and its dual, we show that the objective value of the LP is at least 0.523, which is our first theoretical guarantee on the performance of the Ranking algorithm on unweighted graphs. This result appears in the SIAM Journal of Computing 2018. We next further improve the analysis of the Ranking algorithm by exploring more structural properties. Another observation we made is that if in some instance, a matched node has two unmatched neighbors, then even if the node is demoted to have lowest rank, it will still be matched. This allows us to formulate new LP constraints. Furthermore, in our continuous LP framework, we allow the function to have jump discontinuities. Using these techniques, we have improved the analysis of the performance ratio to 0.526. This result appears in the ACM Transactions of Algorithms 2018. Based on the techniques we achieved for the oblivious problem, we consider similar problems on unweighted graphs in which the algorithm does not have complete information. In particular, we develop a dynamic data structure to maintain the hop-diameter of a tree with unweighted edges such that the increase of the degree of each node is constant. The result has been published in COCOON 2015. Objective 2: We have investigated the node-weighted variant of the oblivious matching problem. In addition, the algorithm also receives the weight of each node, and the performance ratio is the expected sum of the weight of matched nodes to the sum of node weights in an optimal matching. By analyzing graceful instances in which both a node and its partner in the optimal matching are matched, we can formulate new LP constraints. Since the finite LP is characterized by the granularity of the adjustment function, the best possible performance ratio using this approach can be obtained by analyzing the corresponding continuous LP, which has value at least 0.5015. This result appears in the ACM Transactions of Algorithms 2018. We have also studied the edge-weighted variant, which incorporates the node-weighted variant. Our idea is to partition the edges carefully according to the weights, and run the (unweighted) Ranking algorithm on each part. We analyze this approach using a new primal-dual framework known as matching coverage, in which dual feasibility is bypassed. Instead, only dual constraints corresponding to edges in an optimal matching are satisfied. When the number of distinct edge weights is bounded, and our techniques can lead to a performance ratio better than 0.5. The result has been published in the European Symposium on Algorithms 2016. For edge-weighted directed graphs, we also considered spectral techniques to decompose the input graph such that each piece has small edge expansion. We achieve this by considering a stationary probability distribution of an augmented graph. We show a relationship between the spectral properties of the augmented graph and the original graph. The result has been published in COCOON 2015. Objective 3: Based on the linear program techniques we develop on the oblivious matching problem, we have devised efficient LP-based algorithms for locating densest subsets. In most such applications, one typically aims at finding several dense subgraphs which might correspond to communities in social networks or interesting events. We have defined and studied a natural generalization of the densest subgraph problem, where the goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. After showing that such a problem is NP-Hard, we have devised an efficient algorithm that comes with provable guarantees in some cases of interest, as well as, an efficient practical heuristic. Moreover, we have performed extensive evaluations on large real-world graphs, which confirm the efficiency of our algorithms. The results have been published in the ACM International Conference on Web Search and Data Mining (WSDM) 2015. We have extended the densest subgraph problem to hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. This has applications in combinatorial auctions, in which a densest subset corresponds to a group of desirable items. We have developed two exact algorithms and a near-linear time r-approximation algorithm for the problem, where r is the maximum cardinality of an edge in the hypergraph. Furthermore, we have also considered the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We have designed two dynamic approximation algorithms with amortized poly-logarithmic update times, for both the insert-only scenario and the fully dynamic scenario. Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms. The results have appeared in the ACM International Conference on Information and Knowledge Management (CIKM) 2017. Finally, we also considered other practical applications on hypergraphs such as semi-supervised learning. We used the convex program approach whose objective function is not everywhere differentiable. Exploiting the non-uniqueness of the optimal solutions, we defined the concept of confidence intervals which give the exact ranges that unlabeled vertices take in any optimal solution. Moreover, we give a much simpler approach for solving the convex program based on the subgradient method. Our experiments on real-world datasets have confirmed that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method gives faster running times when the number of vertices is much larger than the number of edges. The results have appeared in ICML 2017.

Summary of objectives addressed:

	Objectives	Addressed	Percentage achieved
1.	To develop novel techniques for improving the performance ratio for the problem on unweighted graphs, and consider high-probability statements.	Yes	100%
2.	To design algorithms to break the 0.5 barrier for the node-weighted and the edge-weighted versions of the problem.	Yes	100%
3.	To investigate various versions of the problem on hypergraphs, which have applications in combinatorial auctions.	Yes	100%

Research Outcome

Major findings and research outcome:

In this project, we give an improved analysis of the Ranking algorithm for the oblivious matching problem. For the unweighted case, we have given a novel LP formulation and analyzed its asymptotic behavior using continuous LP. Our improved results have appeared in the SICOMP 2018 paper “Ranking on Arbitrary Graphs: Rematch via Continuous Linear Programming” and TALG 2018 paper “Analyzing Node-Weighted Oblivious Matching Problem via Continuous LP with Jump Discontinuity”, the latter of which also gave the first analysis for the node-weighted variant of the problem. We have also achieved a non-trivial performance ratio for the edge-weighted variant of the oblivious matching problem. When the number of distinct edge weights is bounded, and our techniques can lead to a performance ratio better than 0.5. The result has been published in the ESA 2016 paper “Beating Ratio 0.5 for Weighted Oblivious Matching Problems”. For edge-weighted directed graphs, we also considered spectral techniques to decompose the input graph such that each piece has small edge expansion. The result has been published in the COCOON 2015 paper “Cheeger Inequalities for General Edge-Weighted Directed Graphs”. We have also considered similar problems on unweighted graphs in which the algorithm does not have complete information. For instance, we have developed a dynamic data structure to maintain the hop-diameter of a tree with unweighted edges such that the increase of the degree of each node is constant. The result has been published in COCOON 2015 paper “Dynamic Tree Shortcut with Constant Degree”. Based on the LP techniques we develop on the oblivious matching problem, we have devised efficient LP-based algorithms for locating densest subsets. The results have been published in the WSDM 2015 paper “Finding Subgraphs with Maximum Total Density and Limited Overlap”. We have extended the densest subgraph problem to hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. The results have appeared in the CIKM 2017 paper “Maintaining Densest Subsets Efficiently in Evolving Hypergraphs”. Finally, we also considered other practical applications on hypergraphs such as semi-supervised learning. Our experiments on real-world datasets have confirmed that our confidence interval approach on hypergraphs outperforms existing methods, and our sub-gradient method sometimes gives faster running times. The results have appeared in the ICML 2017 paper “Re-revisiting Learning on Hypergraphs: Confidence Interval and Subgradient Method”.

Potential for further development of the research
and the proposed course of action:

We have developed the continuous LP framework to analyze the performance of the Ranking algorithm for the oblivious matching problem. The next step to use this technique to analyze other allocation problems. Moreover, the insights we have developed in this project can be applied to other problems whose inputs are only partially known. For instance, many classical problems have variants in which the inputs are revealed gradually. Towards the end of this project, we have considered some dynamic data structures and algorithms to tackle this setting, which is an interesting future research direction. Furthermore, while analyzing directed graphs and hypergraphs, we have observed some interesting relationships between clustering, graph partition and spectral properties of the underlying Laplacian operators. In future work, we plan to further explore clustering problems in hypergraphs and develop spectral theories to design relevant algorithms. As we have observed, these problems will have applications such as semi-supervised learning.

Layman's Summary of
Completion Report:

In this project, we have investigated the classical maximum matching problem of finding a maximum subset of non-intersecting edges, in the case where the graph structure is unknown. We have studied various variants (unweighted, node-weighted, edge-weighted) of the problem, and designed algorithms to maximize the size or the weight of the matching. Several technical breakthroughs have been achieved in this project, such as a novel continuous LP framework to analyze the asymptotic behavior of algorithms. Our results have real-world applications such as online advertising, online dating, and the kidney exchange problem. For instance, in the pay-per-click model, the cost for a click on some ad associated with a certain video is given, but whether a user will actually click on the ad is unknown. This corresponds to the node-weighted or the edge-weighted versions of the problem, where the algorithm knows the node weights or the weight of a potential edge if it exists. We have developed new combinatorial techniques, and shown that performance ratios strictly better than the trivial 0.5 ratio can be achieved. In conclusion, we have brought new insights to the problems and the technological breakthroughs have significant impacts for both theoreticians and practitioners.

Research Output

Peer-reviewed journal publication(s)
arising directly from this research project :
(* denotes the corresponding author)

Year of Publication	Author(s)	Title and Journal/Book	Accessible from Institution Repository
2018	T.-H. Hubert Chan*, Fei Chen, Xiaowei Wu	Analyzing Node-Weighted Oblivious Matching Problem via Continuous LP with Jump Discontinuity. ACM Trans. Algorithms 14(2): 12:1-12:25.	No
2018	T-H. Hubert Chan*, Fei Chen, Xiaowei Wu, and Zhichao Zhao	Ranking on Arbitrary Graphs: Rematch via Continuous Linear Programming. SIAM J. Comput. 47-4 (2018), pp. 1529-1546.	No

Recognized international conference(s)
in which paper(s) related to this research
project was/were delivered :

Month/Year/City	Title	Conference Name
Beijing	Cheeger Inequalities for General Edge-Weighted Directed Graphs	COCOON 2015 : The 21st Annual International Computing and Combinatorics Conference
Beijing	Dynamic Tree Shortcut with Constant Degree	COCOON 2015 : The 21st Annual International Computing and Combinatorics Conference
Shanghai	Finding Subgraphs with Maximum Total Density and Limited Overlap	The Eighth ACM International Conference on Web Search and Data Mining, WSDM
Aarhus, Denmark	Beating Ratio 0.5 for Weighted Oblivious Matching Problems	24th Annual European Symposium on Algorithms (ESA)
Singapore	Maintaining Densest Subsets Efficiently in Evolving Hypergraphs	ACM on Conference on Information and Knowledge Management (CIKM)
Sydney	Re-revisiting Learning on Hypergraphs: Confidence Interval and Subgradient Method	34th International Conference on Machine Learning (ICML)

Other impact
(e.g. award of patents or prizes,
collaboration with other research institutions,
technology transfer, etc.):

SCREEN ID: SCRRM00542