ENQUIRE PROJECT DETAILS BY GENERAL PUBLIC

Project Details
Funding Scheme : Early Career Scheme
Project Number : 719312
Project Title(English) : Privacy-Preserving Mechanisms on Multi-User Data in Ubiquitous Computing Environments 
Project Title(Chinese) : 普式計算環境下對多用戶數據的私隱保護機制 
Principal Investigator(English) : Dr Chan, Hubert Tsz Hong 
Principal Investigator(Chinese) :  
Department : School of Computing and Data Science
Institution : The University of Hong Kong
E-mail Address : hubert@cs.hku.hk  
Tel : 28578461 
Co - Investigator(s) :
Panel : Engineering
Subject Area : Computing Science & Information Technology
Exercise Year : 2012 / 13
Fund Approved : 525,600
Project Status : Completed
Completion Date : 29-2-2016
Project Objectives :
Propose new security and privacy models for data aggregation in multi-user environments with untrusted aggregator and potential colluding users. Investigate theoretical limitations on the accuracy of privacy-preserving aggregation.
Design provably secure and private multi-user periodic aggregation mechanisms for various statistics such as sum, average, and variance. The mechanisms should avoid costly communication during each aggregation.
Define notions of privacy granularity with which a user can specify the desired privacy level, for example, in clustering applications to balance against the quality of classification.
Combine our methods to implement privacy-preserving recommendation applications using real-world data, for instance the publicly available Netflix database, and compare the accuracy of our methods with other existing methods.
Abstract as per original application
(English/Chinese):
Handheld devices such as Android phones, iPhones, and iPads have gained tremendous popularity in the past couple of years, and many applications have been developed on these devices to provide users with personalized services. For instance, a restaurant or movie recommendation application can make a suggestion based on a user’s current location, past consumption habit, average ratings, and preference of other similar users. However, as these devices are constantly collecting personal data from the users, there is an increasing concern on privacy infringement in these ubiquitous computing environments. There are current approaches that protect user privacy by refusing to release any sensitive data, but this would make applications less useful and users would not enjoy the benefits brought by these personalized services. We shall investigate how to utilize sensitive user data and study what techniques can give a good tradeoff between user privacy and the utility of collected data. In particular, we will consider several common problems in ubiquitous computing to illustrate our methods with practical considerations. One novel aspect of our framework is that we will combine privacy and security notions to capture the scenario that the aggregator and other users are untrusted. On a high level, our definitions are based on the randomized framework of differential privacy such that when one user changes his input data, the adversary cannot detect this change, because the information received by the adversary has similar distribution as before, or the adversary has limited computational power. We shall investigate the theoretical limitations on the accuracy of privacy-preserving aggregation against computationally unbounded adversaries and under low bandwidth conditions. On the other hand, using applied cryptographic techniques, we shall design mechanisms that are provably secure against polynomial-time adversaries, efficient for periodic aggregation and achieve small error. Moreover, our frameworks allow a user to specify the desired granularity level of privacy to balance against the quality of service he receives. As users place increasing pressure on application developers to respect their privacy, our designs will give evidence that it is possible to achieve both privacy and utility. Moreover, our privacy frameworks will give formal standards for drafting relevant privacy laws. We believe this project will form a solid theoretical foundation and bring us closer to truly secure and private ubiquitous computing.
Realisation of objectives: Objective 1: We have used two approaches to develop new security and privacy models. In the first approach, the idea is to use random noise to protect sensitive data as in the notion of differential privacy. We have adapted this notion to define models in the client-server setting, in which there is no peer-to-peer interaction. Even for single-round computation of the simple sum aggregation function, we have shown that the additive error incurred must be of order at least square root of the number of users in the system, if we assume that the adversary has unbounded computational power. We have also considered a more realistic model in which users can go offline. In this case, the model ensures that the remaining users can still generate sufficient randomness to maintain privacy of sensitive data. This work has led to a patent “Privacy-preserving aggregation of time-series data”. In the second approach, we use randomization to hide the access pattern of data on the server to preserve the privacy of users, in the manner of oblivious RAM (ORAM). Since users can be compromised or colluded, we have designed privacy models under the secure computation settings, in which we assume the adversary can observe the data access pattern on the user’s side. We have a formal description of the secure computation models that concern computation cost on both the client and the server sides. We observe that asymptotic bandwidth overhead might be inadequate at characterizing practical performance, and instead have formulated models in which the circuit complexity will be the measure of efficiency. Our results are published in the CCS 2014 paper “SCORAM: Oblivious RAM for Secure Computation”. Objective 2: As a first step for designing secure and private aggregation mechanisms for various statistics, we have considered the problem of finding heavy hitters among multi-users in a streaming setting. We have designed a distributed protocol, in which the untrusted aggregator can detect the heavy hitters and their approximate frequencies. Our protocol is scalable in the settings where the volume of streaming data is large, since we guarantee low memory usage and processing overhead by each user, and low communication overhead between the users and the aggregator. This result forms part of the MPhil thesis of Mingfei Li. To design secure mechanisms for more general statistics, currently we have investigated oblivious versions of various common data structures such as linked-lists, stacks and priority queues. We have employed ORAM techniques to implement such data structures, and we have also exploited the access pattern graph on user data to design protocols with low overhead bandwidth. In order to design secure protocols that can perform more general tasks other than summation, we have utilized oblivious data structures to hide the data access pattern when users request data blocks from the server. Our framework can support common data structures such as map/set, priority queue, stack, linked-list, etc. We have performed several case studies to apply our techniques. For instance, we have considered an oblivious dynamic memory allocator that can hide memory usage that is triggered by the common malloc and free operations. We have also considered secure protocols that compute more sophisticated statistics that are needed in applications such as shortest-path distances and maximum flow. The results are published in the CCS paper “Oblivious Data Structures”. Motivated by the recent popularity of the decentralized cryptocurrency Bitcoin, we have designed a distributed private voting protocol on the bitcoin blockchain, which essentially is computing the sum and the majority statistics. Unlike traditional voting, there is also a money transfer attached to the voting result. Specifically, there are two candidates and a group of voters, each of whom holds 1 bitcoin and decides one’s preference between the two candidates. The protocol transfers all bitcoins to the most preferred candidate, while privacy and irrevocability are guaranteed. Our bitcoin voting protocols have been published in the ICICS 2015 paper “How to Vote Privately Using Bitcoin”. Objective 3: In order to consider the notion of security and privacy granularity, we have developed a model in which users can decide the data block size on which atomic ORAM operations are performed. We have analyzed the effect of the data block size on the number of levels of recursions to store the position map used in an ORAM. We have developed protocols that exploit the tradeoffs between data block size and communication overhead. Our techniques show that for data granularity of size at least log^2 N bits, the bandwidth overhead achieved will be close to the theoretical optimum. The result has appeared in the CCS 2015 paper “Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound”. Moreover, using a metric space to specify data granularity, we have also discovered that the doubling dimension of the pattern access graph can facilitate efficient clustering of data blocks to achieve lower communication overhead bandwidth. The result has appeared in the CCS 2014 paper “Oblivious Data Structures”. Motivated by private voting verification, we have studied the minimum independent set partition problem. In this problem, each user has some Boolean vector of dimension N, indicating his voting preference. Moreover, a granularity parameter c is given such that the goal is to partition the dimensions into c subsets such that for each subset of dimension, the projection of the N vectors onto those dimensions specified by the subset has maximum cardinality for ensuring privacy. We have shown some hardness results for this problem, which has appeared in the COCOON 2015 paper “On the Complexity of the Minimum Independent Set Partition Problem”. Objective 4: We have designed a private protocol for monitoring heavy hitters, which are useful in recommending popular items. We have analyzed our distributed protocol on the Netflix Contest Dataset, which contains n = 17770 movies with 480189 users’ ratings from 1999-11-11 to 2005-12-31. Our experiments have shown that our protocol can achieve low error and maintain low communication cost. We have also considered using social recommendation techniques in peer-to-peer networks under the setting in which users might not reveal their private data. We have considered our data dissemination protocols on a subgraph of the Twitter-Network crawled by Yang and Leskovec that contains 50,000 peers and 260,000 edges. Experiments show that our protocols have low communication cost and are robust against selfish and colluding users. The results have appeared in the ICCCN Workshop on Wireless Mesh and Ad-hoc Networking 2014. Moreover, we have evaluated our protocols using secure computation in ORAM under the circuit complexity measure. For instance, we have evaluated the performance of various oblivious data structures such as stack, map/set and priority queue. We have also carried out experiments for more complicated tasks such as dynamic memory allocation, random walk on the grid and shortest path queries on planar graphs. The results are reported in the CCS 2014 paper “Oblivious Data Structures”. For our bitcoin voting protocol, we have evaluated the performance to carry out zero-knowledge proofs required in the protocol. Moreover, we have submitted proper transactions on the blockchain to show the practicality of our methods. The results are published in the ICICS 2015 paper “How to Vote Privately Using Bitcoin”.
Research Outcome
Major findings and research outcome: In this project, we developed new security and privacy models. Specifically, we use randomization to hide the access pattern of data on the server to preserve the privacy of users, in the manner of oblivious RAM (ORAM). Our results appear in the CCS 2014 paper “SCORAM: Oblivious RAM for Secure Computation”. Based on our privacy and security models, we have designed protocols for evaluating various statistics and performing different tasks. For instance, we have implemented oblivious data structures such as map/set, priority queue, stack, linked-list. Moreover, our protocols can achieved more complicated tasks such as dynamic memory allocation, random walk on the grid and shortest path queries on planar graphs. The protocol descriptions and experimental results appear in the CCS 2014 paper “Oblivious Data Structures”. In terms of real world-application, we have designed a distributed private voting protocol on the bitcoin blockchain. We have tested the feasibility of our protocol on the blockchain. The results appear in the ICICS 2015 paper “How to Vote Privately Using Bitcoin”. Addressing the notion of security and privacy granularity, we have developed a model in which users can decide the data block size on which atomic ORAM operations are performed. We have analyzed the effect of the data block size on the number of levels of recursions to store the position map used in an ORAM. The result is published in the CCS 2015 paper “Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound”. Motivated by private voting verification, we have investigated the complexity of the minimum independent set partition problem. In this problem, a granularity parameter c is given such that the goal is to partition some given set of indices into c subsets to maintain certain privacy guarantees. Our hardness results have appeared in the COCOON 2015 paper “On the Complexity of the Minimum Independent Set Partition Problem”. We have also employed social recommendation techniques in peer-to-peer networks under the setting in which users might not reveal their private data to design protocols for data dissemination. We have tested our protocols on the Twitter-Network. The results have appeared in the ICCCN Workshop on Wireless Mesh and Ad-hoc Networking 2014. Moreover, research graduate students are involved in this project. In particular, 2 PhD and 2 MPhil students have based part of their theses on the results achieved in this project. Finally, this project has led to a patent “Privacy-preserving aggregation of time-series data”, which shows that the results have practical impact.
Potential for further development of the research
and the proposed course of action:
We have used oblivious RAM as a technique to protect the access pattern of sensitive data. The next step to employ this in a large-scale multi-user setting is to consider generalization of this model that supports parallel accesses. In many applications such as cloud outsourcing, secure processor design, and secure multi-party computation, parallelism is the adopted paradigm and indeed modern computing architectures such as multi-core processors or cluster computing have already supported parallel computing. Hence, a natural future direction is to apply our techniques to oblivious parallel ORAM, which is a basic building block for parallel computing, as in the case for its sequential counterpart ORAM. A central question is how to achieve the optimal overhead in the parallel case. Moreover, in scenarios with multiple CPUs, it is also an interesting direction to investigate any potential tradeoff between the parallel runtime and the total work.
Layman's Summary of
Completion Report:
In this project, we have investigated how to utilize sensitive user data and study the techniques that give a good tradeoff between user privacy and the utility of collected data. In particular, we have developed private protocols for tackling several common problems in the multi-user setting. Examples include basic building blocks in computer science such as stack, map/set and priority queue. On a high level, we use a randomization framework to protect the privacy of user data. We have derived the theoretical guarantees of our protocols and also tested their performance in practical scenarios using real-world datasets. As modern users are increasingly aware of their privacy, our designs have shown that it is possible to achieve both privacy and utility. In conclusion, we believe that this project has contributed to (i) forming a solid theoretical foundation for secure and private computing with multi-users, and (ii) illustrating the practicality of our methods.
Research Output
Peer-reviewed journal publication(s)
arising directly from this research project :
(* denotes the corresponding author)
Recognized international conference(s)
in which paper(s) related to this research
project was/were delivered :
Month/Year/City Title Conference Name
08/2014/Shanghai An incentive protocol for distributed dynamic P2P video-on-demand streaming  23rd International Conference on Computer Communication and Networks 
11/2014/Scottsdale, AZ, USA Oblivious Data Structures  ACM SIGSAC Conference on Computer and Communications Security 
11/2014/Scottsdale, AZ, USA SCORAM: Oblivious RAM for Secure Computation  ACM SIGSAC Conference on Computer and Communications Security 
12/2015/Beijing How to Vote Privately Using Bitcoin  Information and Communications Security - 17th International Conference 
08/2015/Beijing On the Complexity of the Minimum Independent Set Partition Problem  Computing and Combinatorics - 21st International Conference 
10/2015/Denver, CO, USA Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound  ACM SIGSAC Conference on Computer and Communications Security 
Other impact
(e.g. award of patents or prizes,
collaboration with other research institutions,
technology transfer, etc.):
Patent Privacy-preserving aggregation of time-series data. EP 2485430 A2. http://www.google.com/patents/EP2485430A2 Collaboration with other research institutions: The paper “Oblivious Data Structure” published in CCS 2014 is a collaboration with researchers from University of Maryland College Park, UC Berkeley, and Indiana University Bloomington. The paper “SCORAM: Oblivious RAM for Secure Computation” published in CCS 2014 is a collaboration with researchers from University of Maryland College Park, Indiana University Bloomington, and University of Virginia. The paper “Circuit ORAM: On Tightness of the Goldreich-Ostrovsky Lower Bound” published in CCS 2015 is a collaboration with researchers from University of Maryland College Park, and Cornell University. The paper “On the Complexity of the Minimum Independent Set Partition Problem” published in COCOON 2015 is a collaboration with the University of Maryland College Park.
Realisation of the education plan:

  SCREEN ID: SCRRM00542