# PapersCutA shortcut to recent security papers

### Arxiv

#### Differentially private $k$-means clustering via exponential mechanism and max cover

Authors: Anamay Chaturvedi, Huy Nguyen, Eric Xu

Abstract: We introduce a new $(\epsilon_p, \delta_p)$-differentially private algorithm for the $k$-means clustering problem. Given a dataset in Euclidean space, the $k$-means clustering problem requires one to find $k$ points in that space such that the sum of squares of Euclidean distances between each data point and its closest respective point among the $k$ returned is minimised. Although there exist privacy-preserving methods with good theoretical guarantees to solve this problem [Balcan et al., 2017; Kaplan and Stemmer, 2018], in practice it is seen that it is the additive error which dictates the practical performance of these methods. By reducing the problem to a sequence of instances of maximum coverage on a grid, we are able to derive a new method that achieves lower additive error then previous works. For input datasets with cardinality $n$ and diameter $\Delta$, our algorithm has an $O(\Delta^2 (k \log^2 n \log(1/\delta_p)/\epsilon_p + k\sqrt{d \log(1/\delta_p)}/\epsilon_p))$ additive error whilst maintaining constant multiplicative error. We conclude with some experiments and find an improvement over previously implemented work for this problem.

Date: 2 Sep 2020

#### Binary Compatibility For SGX Enclaves

Authors: Shweta Shinde, Jinhua Cui, Satyaki Sen, Pinghai Yuan, Prateek Saxena

Abstract: Enclaves, such as those enabled by Intel SGX, offer a powerful hardware isolation primitive for application partitioning. To become universally usable on future commodity OSes, enclave designs should offer compatibility with existing software. In this paper, we draw attention to 5 design decisions in SGX that create incompatibility with existing software. These represent concrete starting points, we hope, for improvements in future TEEs. Further, while many prior works have offered partial forms of compatibility, we present the first attempt to offer binary compatibility with existing software on SGX. We present Ratel, a system that enables a dynamic binary translation engine inside SGX enclaves on Linux. Through the lens of Ratel, we expose the fundamental trade-offs between performance and complete mediation on the OS-enclave interface, which are rooted in the aforementioned 5 SGX design restrictions. We report on an extensive evaluation of Ratel on over 200 programs, including micro-benchmarks and real applications such as Linux utilities.

Date: 2 Sep 2020

#### Flow-based detection and proxy-based evasion of encrypted malware C2 traffic

Authors: Carlos Novo, Ricardo Morla

Abstract: State of the art deep learning techniques are known to be vulnerable to evasion attacks where an adversarial sample is generated from a malign sample and misclassified as benign. Detection of encrypted malware command and control traffic based on TCP/IP flow features can be framed as a learning task and is thus vulnerable to evasion attacks. However, unlike e.g. in image processing where generated adversarial samples can be directly mapped to images, going from flow features to actual TCP/IP packets requires crafting the sequence of packets, with no established approach for such crafting and a limitation on the set of modifiable features that such crafting allows. In this paper we discuss learning and evasion consequences of the gap between generated and crafted adversarial samples. We exemplify with a deep neural network detector trained on a public C2 traffic dataset, white-box adversarial learning, and a proxy-based approach for crafting longer flows. Our results show 1) the high evasion rate obtained by using generated adversarial samples on the detector can be significantly reduced when using crafted adversarial samples; 2) robustness against adversarial samples by model hardening varies according to the crafting approach and corresponding set of modifiable features that the attack allows for; 3) incrementally training hardened models with adversarial samples can produce a level playing field where no detector is best against all attacks and no attack is best against all detectors, in a given set of attacks and detectors. To the best of our knowledge this is the first time that level playing field feature set- and iteration-hardening are analyzed in encrypted C2 malware traffic detection.

Comment: 9 pages, 6 figures

Date: 2 Sep 2020

#### Magma: A Ground-Truth Fuzzing Benchmark

Authors: Ahmad Hazimeh, Adrian Herrera, Mathias Payer

Abstract: High scalability and low running costs have made fuzz testing the de facto standard for discovering software bugs. Fuzzing techniques are constantly being improved in a race to build the ultimate bug-finding tool. However, while fuzzing excels at finding bugs, evaluating and comparing fuzzer performance is challenging due to the lack of metrics and benchmarks. Crash count, the most common performance metric, is inaccurate due to imperfections in deduplication techniques. Moreover, the lack of a unified set of targets results in ad hoc evaluations that inhibit fair comparison. We tackle these problems by developing Magma, a ground-truth evaluation framework that enables uniform fuzzer evaluation and comparison. By introducing real bugs into real software, Magma allows for realistic evaluation of fuzzers against a broad set of targets. By instrumenting these bugs, Magma also enables the collection of bug-centric performance metrics independent of the fuzzer. Magma is an open benchmark consisting of seven targets that perform a variety of input manipulations and complex computations, presenting a challenge to state-of-the-art fuzzers. We evaluate six popular mutation-based greybox fuzzers (AFL, AFLFast, AFL++, FairFuzz, MOpt-AFL, and honggfuzz) against Magma over 200 000 CPU-hours. Based on the number of bugs reached, triggered, and detected, we draw conclusions about the fuzzers' exploration and detection capabilities. This provides insight into fuzzer performance evaluation, highlighting the importance of ground truth in performing more accurate and meaningful evaluations.

Date: 2 Sep 2020

#### Perceptual Deep Neural Networks: Adversarial Robustness through Input Recreation

Authors: Danilo Vasconcellos Vargas, Bingli Liao, Takahiro Kanzaki

Abstract: Adversarial examples have shown that albeit highly accurate, models learned by machines, differently from humans,have many weaknesses. However, humans' perception is also fundamentally different from machines, because we do not see the signals which arrive at the retina but a rather complex recreation of them. In this paper, we explore how machines could recreate the input as well as investigate the benefits of such an augmented perception. In this regard, we propose Perceptual Deep Neural Networks ($\varphi$DNN) which also recreate their own input before further processing. The concept is formalized mathematically and two variations of it are developed (one based on inpainting the whole image and the other based on a noisy resized super resolution recreation). Experiments reveal that $\varphi$DNNs can reduce attacks' accuracy substantially, surpassing even state-of-the-art defenses. Moreover, the recreation process intentionally corrupts the input image. Interestingly, we show by ablation tests that corrupting the input is, although counter-intuitive,beneficial. This suggests that the blind-spot in vertebrates might also be, analogously, the precursor of visual robustness. Thus, $\varphi$DNNs reveal that input recreation has strong benefits for artificial neural networks similar to biological ones, shedding light into the importance of the blind-spot and starting an area of perception models for robust recognition in artificial intelligence.

Date: 2 Sep 2020

#### Adversarial Attacks on Deep Learning Systems for User Identification based on Motion Sensors

Authors: Cezara Benegui, Radu Tudor Ionescu

Abstract: For the time being, mobile devices employ implicit authentication mechanisms, namely, unlock patterns, PINs or biometric-based systems such as fingerprint or face recognition. While these systems are prone to well-known attacks, the introduction of an explicit and unobtrusive authentication layer can greatly enhance security. In this study, we focus on deep learning methods for explicit authentication based on motion sensor signals. In this scenario, attackers could craft adversarial examples with the aim of gaining unauthorized access and even restraining a legitimate user to access his mobile device. To our knowledge, this is the first study that aims at quantifying the impact of adversarial attacks on machine learning models used for user identification based on motion sensors. To accomplish our goal, we study multiple methods for generating adversarial examples. We propose three research questions regarding the impact and the universality of adversarial examples, conducting relevant experiments in order to answer our research questions. Our empirical results demonstrate that certain adversarial example generation methods are specific to the attacked classification model, while others tend to be generic. We thus conclude that deep neural networks trained for user identification tasks based on motion sensors are subject to a high percentage of misclassification when given adversarial input.

Comment: Extended version (12 pages, 1 figure) of our paper (9 pages, 1 figure) accepted at ICONIP 2020

Date: 2 Sep 2020

#### Java Cryptography Uses in the Wild

Authors: Mohammadreza Hazhirpasand, Mohammad Ghafari, Oscar Nierstrasz

Abstract: [Background] Previous research has shown that developers commonly misuse cryptography APIs. [Aim] We have conducted an exploratory study to find out how crypto APIs are used in open-source Java projects, what types of misuses exist, and why developers make such mistakes. [Method] We used a static analysis tool to analyze hundreds of open-source Java projects that rely on Java Cryptography Architecture, and manually inspected half of the analysis results to assess the tool results. We also contacted the maintainers of these projects by creating an issue on the GitHub repository of each project, and discussed the misuses with developers. [Results] We learned that 85% of Cryptography APIs are misused, however, not every misuse has severe consequences. Developer feedback showed that security caveats in the documentation of crypto APIs are rare, developers may overlook misuses that originate in third-party code, and the context where a Crypto API is used should be taken into account. [Conclusion] We conclude that using Crypto APIs is still problematic for developers but blindly blaming them for such misuses may lead to erroneous conclusions.

Comment: The ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2020

Date: 2 Sep 2020

#### Privacy-Preserving Distributed Processing: Metrics, Bounds, and Algorithms

Authors: Qiongxiu Li, Jaron Skovsted Gundersen, Richard Heusdens, Mads Græsbøll Christensen

Abstract: Privacy-preserving distributed processing has recently attracted considerable attention. It aims to design solutions for conducting signal processing tasks over networks in a decentralized fashion without violating privacy. Many algorithms can be adopted to solve this problem such as differential privacy, secure multiparty computation, and the recently proposed distributed optimization based subspace perturbation. However, how these algorithms relate to each other is not fully explored yet. In this paper, we therefore first propose information-theoretic metrics based on mutual information. Using the proposed metrics, we are able to compare and relate a number of existing well-known algorithms. We then derive a lower bound on individual privacy that gives insights on the nature of the problem. To validate the above claims, we investigate a concrete example and compare a number of state-of-the-art approaches in terms of different aspects such as output utility, individual privacy and algorithm robustness against the number of corrupted parties, using not only theoretical analysis but also numerical validation. Finally, we discuss and provide principles for designing appropriate algorithms for different applications.

Comment: 12 pages, 3 figures

Date: 2 Sep 2020

#### Privacy Leakage of SIFT Features via Deep Generative Model based Image Reconstruction

Authors: Haiwei Wu, Jiantao Zhou

Abstract: Many practical applications, e.g., content based image retrieval and object recognition, heavily rely on the local features extracted from the query image. As these local features are usually exposed to untrustworthy parties, the privacy leakage problem of image local features has received increasing attention in recent years. In this work, we thoroughly evaluate the privacy leakage of Scale Invariant Feature Transform (SIFT), which is one of the most widely-used image local features. We first consider the case that the adversary can fully access the SIFT features, i.e., both the SIFT descriptors and the coordinates are available. We propose a novel end-to-end, coarse-to-fine deep generative model for reconstructing the latent image from its SIFT features. The designed deep generative model consists of two networks, where the first one attempts to learn the structural information of the latent image by transforming from SIFT features to Local Binary Pattern (LBP) features, while the second one aims to reconstruct the pixel values guided by the learned LBP. Compared with the state-of-the-art algorithms, the proposed deep generative model produces much improved reconstructed results over three public datasets. Furthermore, we address more challenging cases that only partial SIFT features (either SIFT descriptors or coordinates) are accessible to the adversary. It is shown that, if the adversary can only have access to the SIFT descriptors while not their coordinates, then the modest success of reconstructing the latent image can be achieved for highly-structured images (e.g., faces) and would fail in general settings. In addition, the latent image can be reconstructed with reasonably good quality solely from the SIFT coordinates. Our results would suggest that the privacy leakage problem can be largely avoided if the SIFT coordinates can be well protected.

Date: 2 Sep 2020

#### zkay v0.2: Practical Data Privacy for Smart Contracts

Authors: Nick Baumann, Samuel Steffen, Benjamin Bichsel, Petar Tsankov, Martin Vechev

Abstract: Recent work introduces zkay, a system for specifying and enforcing data privacy in smart contracts. While the original prototype implementation of zkay (v0.1) demonstrates the feasibility of the approach, its proof-of-concept implementation suffers from severe limitations such as insecure encryption and lack of important language features. In this report, we present zkay v0.2, which addresses its predecessor's limitations. The new implementation significantly improves security, usability, modularity, and performance of the system. In particular, zkay v0.2 supports state-of-the-art asymmetric and hybrid encryption, introduces many new language features (such as function calls, private control flow, and extended type support), allows for different zk-SNARKs backends, and reduces both compilation time and on-chain costs.

Date: 2 Sep 2020