18nov10:05 AM10:50 AMAttended-over Distributed Specificity for Information Extraction in CybersecurityNicholas Prayogo - Machine Learning Developer & Dr. Ehsan Amjadian - Director of AI & Technology10:05 AM - 10:50 AM
Abstract:Cybersecurity has become a critical necessity in aerospace, especially given the recent prevalent utilization of cyberspace as well as computing infrastructure, tools, and platforms in the domain.
Cybersecurity has become a critical necessity in aerospace, especially given the recent prevalent utilization of cyberspace as well as computing infrastructure, tools, and platforms in the domain. Cybersecurity relies on the timely discovery of potential and active vulnerabilities to mitigate threats, cyberattacks and theft of intellectual property. These vulnerabilities are commonly communicated via textual channels which can be mined and detected by information extraction techniques, such as Automatic Terminology Extraction (ATE), in order to mitigate imminent and future attacks. Distributed Specificity, a modern and highly effective ATE paradigm, was first introduced by ,  and further improved by . These methods, however, regard the context words linearly and indiscriminately, which 1) disregards the non-linear relations among the context words and 2) assigns an equal weight to each word edge in the sequence despite the fact that the connections of the nodes are not of equal strength. As our first contribution, the present article addresses this shortcoming by integrating the self-attention mechanism into the distributed specificity paradigm to enhance candidate terms’ representation resulting in a more accurate extraction, which in turn lends itself to the higher coverage and a more timely discovery of cyberattacks. Since no ATE dataset exists in the cybersecurity domain, to the best of our knowledge, the present endeavor offers a dataset, as its second contribution, that will be the benchmark for our experiments and the ones to follow by both ATE and cybersecurity research communities. This complements the SemEval’s SecureNLP  information extraction repository. We extend the MalwareTextDB (comprised of 147 cybersecurity reports) to include labeled terminologies as ground truth. The labeled dataset is annotated by cybersecurity practitioners using Term Evaluator . Although for most industries higher coverage and a more timely discovery of cyberthreats will translate into protecting vital data, customer privacy, and sizable funds to name but a few advantages, in aerospace the risk and therefore the reward may be much higher directly resulting in saving human lives in both small and large scales.
What You’ll Learn:
The purpose of this talk is to introduce the latest method in cyber threat discovery, called Attended-Over Distributed Specificity. We utilize an information extraction technique called Automatic Term Extraction (ATE) and a series of attention-based architectures to detect cybersecurity terms, that in turn help identify vulnerabilities as they appear in the public domain.
Nicholas is a Machine Learning Developer at Royal Bank of Canada. He received his B.Eng in mechanical engineering from Ryerson University. His research work includes Markov models, reinforcement learning, time series forecasting, and natural language processing.
(Thursday) 10:05 AM - 10:50 AM