# PapersCutA shortcut to recent security papers

### Arxiv

#### A System for Automated Open-Source Threat Intelligence Gathering and Management

Authors: Peng Gao, Xiaoyuan Liu, Edward Choi, Bhavna Soman, Chinmaya Mishra, Kate Farris, Dawn Song

Abstract: Sophisticated cyber attacks have plagued many high-profile businesses. To remain aware of the fast-evolving threat landscape, open-source Cyber Threat Intelligence (OSCTI) has received growing attention from the community. Commonly, knowledge about threats is presented in a vast number of OSCTI reports. Despite the pressing need for high-quality OSCTI, existing OSCTI gathering and management platforms, however, have primarily focused on isolated, low-level Indicators of Compromise. On the other hand, higher-level concepts (e.g., adversary tactics, techniques, and procedures) and their relationships have been overlooked, which contain essential knowledge about threat behaviors that is critical to uncovering the complete threat scenario. To bridge the gap, we propose SecurityKG, a system for automated OSCTI gathering and management. SecurityKG collects OSCTI reports from various sources, uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors, and constructs a security knowledge graph. SecurityKG also provides a UI that supports various types of interactivity to facilitate knowledge graph exploration.

Date: 19 Jan 2021

#### Information Theoretic Secure Aggregation with User Dropouts

Authors: Yizhou Zhao, Hua Sun

Abstract: In the robust secure aggregation problem, a server wishes to learn and only learn the sum of the inputs of a number of users while some users may drop out (i.e., may not respond). The identity of the dropped users is not known a priori and the server needs to securely recover the sum of the remaining surviving users. We consider the following minimal two-round model of secure aggregation. Over the first round, any set of no fewer than $U$ users out of $K$ users respond to the server and the server wants to learn the sum of the inputs of all responding users. The remaining users are viewed as dropped. Over the second round, any set of no fewer than $U$ users of the surviving users respond (i.e., dropouts are still possible over the second round) and from the information obtained from the surviving users over the two rounds, the server can decode the desired sum. The security constraint is that even if the server colludes with any $T$ users and the messages from the dropped users are received by the server (e.g., delayed packets), the server is not able to infer any additional information beyond the sum in the information theoretic sense. For this information theoretic secure aggregation problem, we characterize the optimal communication cost. When $U \leq T$, secure aggregation is not feasible, and when $U > T$, to securely compute one symbol of the sum, the minimum number of symbols sent from each user to the server is $1$ over the first round, and $1/(U-T)$ over the second round.

Date: 19 Jan 2021

#### Panel: Humans and Technology for Inclusive Privacy and Security

Authors: Sanchari Das, Robert S. Gutzwiller, Rod D. Roscoe, Prashanth Rajivan, Yang Wang, L. Jean Camp, Roberto Hoyle

Abstract: Computer security and user privacy are critical issues and concerns in the digital era due to both increasing users and threats to their data. Separate issues arise between generic cybersecurity guidance (i.e., protect all user data from malicious threats) and the individualistic approach of privacy (i.e., specific to users and dependent on user needs and risk perceptions). Research has shown that several security- and privacy-focused vulnerabilities are technological (e.g., software bugs (Streiff, Kenny, Das, Leeth, & Camp, 2018), insecure authentication (Das, Wang, Tingle, & Camp, 2019)), or behavioral (e.g., sharing passwords (Das, Dingman, & Camp, 2018); and compliance (Das, Dev, & Srinivasan, 2018) (Dev, Das, Rashidi, & Camp, 2019)). This panel proposal addresses a third category of sociotechnical vulnerabilities that can and sometimes do arise from non-inclusive design of security and privacy. In this panel, we will address users' needs and desires for privacy. The panel will engage in in-depth discussions about value-sensitive design while focusing on potentially vulnerable populations, such as older adults, teens, persons with disabilities, and others who are not typically emphasized in general security and privacy concerns. Human factors have a stake in and ability to facilitate improvements in these areas.

Date: 18 Jan 2021

#### Fast Privacy-Preserving Text Classification based on Secure Multiparty Computation

Authors: Amanda Resende, Davis Railsback, Rafael Dowsley, Anderson C. A. Nascimento, Diego F. Aranha

Abstract: We propose a privacy-preserving Naive Bayes classifier and apply it to the problem of private text classification. In this setting, a party (Alice) holds a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will only learn the result of the classifier applied to her text input and Bob learns nothing. Our solution is based on Secure Multiparty Computation (SMC). Our Rust implementation provides a fast and secure solution for the classification of unstructured text. Applying our solution to the case of spam detection (the solution is generic, and can be used in any other scenario in which the Naive Bayes classifier can be employed), we can classify an SMS as spam or ham in less than 340ms in the case where the dictionary size of Bob's model includes all words (n = 5200) and Alice's SMS has at most m = 160 unigrams. In the case with n = 369 and m = 8 (the average of a spam SMS in the database), our solution takes only 21ms.

Date: 18 Jan 2021

#### MIMOSA: Reducing Malware Analysis Overhead with Coverings

Authors: Mohsen Ahmadi, Kevin Leach, Ryan Dougherty, Stephanie Forrest, Westley Weimer

Abstract: There is a growing body of malware samples that evade automated analysis and detection tools. Malware may measure fingerprints ("artifacts") of the underlying analysis tool or environment and change their behavior when artifacts are detected. While analysis tools can mitigate artifacts to reduce exposure, such concealment is expensive. However, not every sample checks for every type of artifact-analysis efficiency can be improved by mitigating only those artifacts most likely to be used by a sample. Using that insight, we propose MIMOSA, a system that identifies a small set of "covering" tool configurations that collectively defeat most malware samples with increased efficiency. MIMOSA identifies a set of tool configurations that maximize analysis throughput and detection accuracy while minimizing manual effort, enabling scalable automation to analyze stealthy malware. We evaluate our approach against a benchmark of 1535 labeled stealthy malware samples. Our approach increases analysis throughput over state of the art on over 95% of these samples. We also investigate cost-benefit tradeoffs between the fraction of successfully-analyzed samples and computing resources required. MIMOSA provides a practical, tunable method for efficiently deploying analysis resources.

Date: 18 Jan 2021

#### Data Protection Impact Assessment for the Corona App

Authors: Kirsten Bock, Christian R. Kühne, Rainer Mühlhoff, Měto R. Ost, Jörg Pohle, Rainer Rehak

Abstract: Since SARS-CoV-2 started spreading in Europe in early 2020, there has been a strong call for technical solutions to combat or contain the pandemic, with contact tracing apps at the heart of the debates. The EU's General Daten Protection Regulation (GDPR) requires controllers to carry out a data protection impact assessment (DPIA) where their data processing is likely to result in a high risk to the rights and freedoms (Art. 35 GDPR). A DPIA is a structured risk analysis that identifies and evaluates possible consequences of data processing relevant to fundamental rights and describes the measures envisaged to address these risks or expresses the inability to do so. Based on the Standard Data Protection Model (SDM), we present a scientific DPIA which thoroughly examines three published contact tracing app designs that are considered to be the most "privacy-friendly": PEPP-PT, DP-3T and a concept summarized by Chaos Computer Club member Linus Neumann, all of which process personal health data. The DPIA starts with an analysis of the processing context and some expected use cases. Then, the processing activities are described by defining a realistic processing purpose. This is followed by the legal assessment and threshold analysis. Finally, we analyse the weak points, the risks and determine appropriate protective measures. We show that even decentralized implementations involve numerous serious weaknesses and risks. Legally, consent is unfit as legal ground hence data must be processed based on a law. We also found that measures to realize the rights of data subjects and affected people are not sufficient. Last but not least, we show that anonymization must be understood as a continuous process, which aims at separating the personal reference and is based on a mix of legal, organizational and technical measures. All currently available proposals lack such an explicit separation process.

Comment: 97 pages, German version here: https://www.fiff.de/dsfa-corona

Date: 18 Jan 2021

#### Leveraging AI to optimize website structure discovery during Penetration Testing

Authors: Diego Antonelli, Roberta Cascella, Gaetano Perrone, Simon Pietro Romano, Antonio Schiano

Abstract: Dirbusting is a technique used to brute force directories and file names on web servers while monitoring HTTP responses, in order to enumerate server contents. Such a technique uses lists of common words to discover the hidden structure of the target website. Dirbusting typically relies on response codes as discovery conditions to find new pages. It is widely used in web application penetration testing, an activity that allows companies to detect websites vulnerabilities. Dirbusting techniques are both time and resource consuming and innovative approaches have never been explored in this field. We hence propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence. More specifically, we use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning. The created clusters are used in an ad-hoc implemented next-word intelligent strategy. This paper demonstrates that the usage of clustering techniques outperforms the commonly used brute force methods. Performance is evaluated by testing eight different web applications. Results show a performance increase that is up to 50% for each of the conducted experiments.

Date: 18 Jan 2021

#### Applying High-Performance Bioinformatics Tools for Outlier Detection in Log Data

Authors: Markus Wurzenberger, Florian Skopik, Roman Fiedler, Wolfgang Kastner

Abstract: Most of today's security solutions, such as security information and event management (SIEM) and signature based IDS, require the operator to evaluate potential attack vectors and update detection signatures and rules in a timely manner. However, today's sophisticated and tailored advanced persistent threats (APT), malware, ransomware and rootkits, can be so complex and diverse, and often use zero day exploits, that a pure signature-based blacklisting approach would not be sufficient to detect them. Therefore, we could observe a major paradigm shift towards anomaly-based detection mechanisms, which try to establish a system behavior baseline -- either based on netflow data or system logging data -- and report any deviations from this baseline. While these approaches look promising, they usually suffer from scalability issues. As the amount of log data generated during IT operations is exponentially growing, high-performance analysis methods are required that can handle this huge amount of data in real-time. In this paper, we demonstrate how high-performance bioinformatics tools can be applied to tackle this issue. We investigate their application to log data for outlier detection to timely reveal anomalous system behavior that points to cyber attacks. Finally, we assess the detection capability and run-time performance of the proposed approach.

Date: 18 Jan 2021

#### SoK: Fully Homomorphic Encryption Compilers

Authors: Alexander Viand, Patrick Jattke, Anwar Hithnawi

Abstract: Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. Hence, it provides resilience in situations where computations are carried out by an untrusted or potentially compromised party. This powerful concept was first conceived by Rivest et al. in the 1970s. However, it remained unrealized until Craig Gentry presented the first feasible FHE scheme in 2009. The advent of the massive collection of sensitive data in cloud services, coupled with a plague of data breaches, moved highly regulated businesses to increasingly demand confidential and secure computing solutions. This demand, in turn, has led to a recent surge in the development of FHE tools. To understand the landscape of recent FHE tool developments, we conduct an extensive survey and experimental evaluation to explore the current state of the art and identify areas for future development. In this paper, we survey, evaluate, and systematize FHE tools and compilers. We perform experiments to evaluate these tools' performance and usability aspects on a variety of applications. We conclude with recommendations for developers intending to develop FHE-based applications and a discussion on future directions for FHE tools development.

Comment: 13 pages, to appear in IEEE Symposium on Security and Privacy 2021

Date: 18 Jan 2021

#### DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection

Authors: Yuanchun Li, Jiayi Hua, Haoyu Wang, Chunyang Chen, Yunxin Liu

Abstract: Deep learning models are increasingly used in mobile applications as critical components. Unlike the program bytecode whose vulnerabilities and threats have been widely-discussed, whether and how the deep learning models deployed in the applications can be compromised are not well-understood since neural networks are usually viewed as a black box. In this paper, we introduce a highly practical backdoor attack achieved with a set of reverse-engineering techniques over compiled deep learning models. The core of the attack is a neural conditional branch constructed with a trigger detector and several operators and injected into the victim model as a malicious payload. The attack is effective as the conditional logic can be flexibly customized by the attacker, and scalable as it does not require any prior knowledge from the original model. We evaluated the attack effectiveness using 5 state-of-the-art deep learning models and real-world samples collected from 30 users. The results demonstrated that the injected backdoor can be triggered with a success rate of 93.5%, while only brought less than 2ms latency overhead and no more than 1.4% accuracy decrease. We further conducted an empirical study on real-world mobile deep learning apps collected from Google Play. We found 54 apps that were vulnerable to our attack, including popular and security-critical ones. The results call for the awareness of deep learning application developers and auditors to enhance the protection of deployed models.

Comment: ICSE 2021

Date: 18 Jan 2021