At L1NNA, we believe that AI techniques will drive the next-generation security solutions and transform our cybersecurity landscape in a dramatic way. Keeping this in mind, we advance AI to empower human, and secure AI for reliability and explainability. Our research portfolio bridges the area of machine learning, data mining, artificial intelligence and cybersecurity. We mainly focus on the following four dimensions
Protection
Securing the future with AI. Protect your organization against intrusion, malware, spams, zero-day vulnerabilities, and other cyber attacks. Adapt your defense to the constantly changing attack behaviors.
Investigation
Empowering cyber-crime investigators, malware analysts, and reverse engineers with the latest possibility of AI, to understand the provenance and the inner-working of a specific piece of artifact.
Reliability
Promoting the reliability of AI against perturbation, poisoning and adversarial samples by studying the evasive and defensive techniques. Explore the possibility of AI specification and verification.
Explainability
Explaining the unknown. Visualizing the AI's decision making process by post-hoc explanation and build-in transparency in AI. Facilitate the interaction between human and AI systems in the future.
L1NNA research laboratory is located within the School of Computing, at Queen's University in Kingston, Ontario, Canada. Queen’s University is a community, 175 years of tradition, academic excellence, research, and beautiful waterfront campus made of limestone buildings and modern facilities.
At L1NNA, We design novel data mining and machine learning models and develop them as practical software applications that address critical issues in cybersecurity. Our research has attracted industrial interests and has an ongoing close collaboration DRDC. We focus on both the research and the development aspects of these systems.
Research
Understanding a problem's practical challenges, and connect them with the theoretical research gaps.
Development
Putting our research outcome into practice. Develop scalable ready-to-use tools and systems for various problems.
Training
Connecting industrial needs to our research and teaching programs. Preparing HQP for the evolving job market.
News
-
2022-2023, DRDC IDEaS 1M (33%)
-
2021-2026, DRDC Contract 1.5M (50%)
-
2021-2024, NSERC Alliance 1.24M (50%)
-
2021-Aug, DRDC Research Contract 75K
-
2020-Dec, DRDC IDEaS Grant 200K
-
2020-Aug, DRDC Research Contract 50K
-
2020-Apr, NSERC Discovery (2025) 157K
-
2019-Nov, FAS Infrastructure support ($200K)
-
2019-Sep, Nvidia GPU donation ($6K in-kind)
-
2019-Jul, Research contract ($60K) from DRDC
-
2019-Mar, Laboratory established
-
2019-Mar, RIG grant ($60K) from SOC
Current Team Members
2019-2020 | Congwei Chen | Undergraduate @ Queen's | TDSC under review | MSc @ UoT | |
2020-2021 | Rylen Sampson | Undergraduate @ Queen's | NSERC Research Award | MSc @ SFU | |
2019-2021 | Scarlett Taviss | Queen's University - Master's Student | Asm2Seq - Explainable Assembly CodeFunctional Summary Generation | Publication under review | Security Consultant - IBM |
2019-2021 | Haohan Bo | McGill University - Master's Student | Differentially Private Text Generation for Authorship Anonymization | NAACL 2021 | Software Engineer - Microsoft |
2017-2018 | Sarah Alkhodair | Concordia University - Doctoral Student | Assessing Trust and Veracity of Big Data in Social Media | Publication: IPM18, ICML19 | Assistant Professor, King Saud University, Saudi Arabia |
2017-2019 | Arnaud Massenet | McGill University - Undergraduate Student | Stylometric Analysis for Authorship Analysis | Software: StyloTensor - Automated Benchmark of Machine Learning Models for Authorship Analysis | Graduate student at Oxford University |
Sum Ergo Computo; I am therefore I compute.
Recruitment: PhD/MSc Students in Machine Learning and Cybersecurity
We are looking for self-motivated graduate students with passion in machine learning or cybersecurity. The ideal condidate will possess good writing skills, proficiency in programming, and be familiar with Java/C#/Python/TypeScript. Knowledge of TensorFlow/Spark/NoSQL/React+Django/Docker/Kubernetes will be a big plus!
All research-based graduate students will be fully funded.Minimum English requirement [for international students]: TOEFL: 88 or IELTS: 7
Start date: flexible
Interested? Shoot your CV over here!Current Projects
Assembly Code Data Mining
Assembly clone search greatly reduces the manual effort of reverse engineering since it can identify the cloned parts that have been previously analyzed. By closely collaborating with reverse engineers, I studied the challenges, designed and implemented an award-winning clone search engine called Kam1n0. It also includes specialized techniques that can mitigate the variance introduced by different processor families, different compilers, optimization techniques, and binary protection techniques. Kam1n0 has been presented at the Smart Cybersecurity Network Canada (SERENE-RISC), SOPHOS, ESET, Above Security, and Google.
Malware Phenotype Decomposition
Malware Phenotypes denote observable characteristics of individual malware such as specific observal patterns in code, string, header, and api, etc. Given a piece of unknown malware, the proposed system is able to quickly decompose the malware into existing known components or malware families based on the observable characterstics in code, string, header, api, and import modules, etc. These characteristics provide malware reverse engineers a natural and intuitive information to distinguish one malware family from the others. The proposed research under this direction aims at developing a scalable reality malware Phenotype retrieval system for security analysts. This problem is challenging given the time constrains as well as the number of malware being collected perday. In large organizations or security companies, this number can exceed 300,000 unique samples per day. Having an accurate matching system, scalable and efficient index, and optimized storage requirements is the key to have a practical and successful system for this problem.
Software Ecosystem Genetic Analysis
In the context of cybersecurity, the key to understanding the massive binary executables communicated over the network and the running processes on every single IoT device is to understand the actual machine instructions that are being executed. All the machine instructions from software are generated by the human-written code. Resembling a biological evolutionary process, software is not created from scratch but evolves over time. Studies have shown that 50% of all the files across open source projects have been reused at least twice, and more than 50% of the developers modify the components before reusing them. Software ecosystem is highly complex with interconnected information including source code, binary executables, documentation, comments, descriptions, vulnerabilities, and specifications, etc. All these interconnected entities form a heterogeneous information network. The ultimate task is to learn a robust latent vector representation for every defined entity in this network. The learned vector captures the semantic as well as the syntactic relationship between entities of different granularity to their neighbors. The learned representation enables us to match similar entities and observe the hierarchical relationship among entities on different granularity. They directly contribute to the source code and binary code clone search problems.
Authorship Analysis
The internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Given the anonymous documents, authorship analysis is the study of identifying the actual author and his/her socio-linguistic characteristics. Many linguistic stylometric features and computational techniques have been extensively studied for this purpose. However, most of them emphasize promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. I opt for an interpretable and explainable approach by which writing styles can be visualized, compared, and interpreted by an investigator like fingerprints. I also propose to integrate differential privacy and reinforcement learning to paraphrase text where writing style is sanitized.
Binary Provenance Analysis
Binary provenance denotes the characteristics of a program that derives from its path from source code to executable form. Binary provenance is important in the domain of binary forensic and performance analysis. It provides important evidential trial for cybersecurity investigators to track down the hackers behind the security accidence. For example, the Lazarus group is linked to the Wannacry incidence by code similarity. I mainly focus on two critical aspects: toolchain recovery and authorship analysis.
Neural Malware Analysis
Malware behavioral indicators denote those potentially high-risk malicious behaviors exhibited, such as unintended network communications, file encryption, keystroke logging, sandbox evasion, and camera manipulation. Generally, they are generated using sandboxes or simulators. However, the complexity of modern malware has been considerably increased. Malware is becoming sandbox-aware by incorporating modern evasive techniques. To address these issues, I propose a new neural network-based static scanner that can characterize the malicious behaviors of a given executable, without running it in a sandbox. It can be used as an additional binary analytic layer to mitigate the issues of polymorphism, metamorphism, and evasive techniques.
Highlight: KAM1N0
Open-source binary analysis and assembly code management platform.
Publications and Talks
Browse our latest publications spanning a wide range of research problems.
Journal Articles (3 + 2 in progress)
- S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. Learning stylometric representations for authorship analysis. IEEE Transactions on Cybernetics (CYB18), ():14 pages, 2018, in press. [ JCR impact factor: 8.803, 5-year: 8.775 | Abstract | BibTex | Code | PDF | URL ]
- M. H. Altakrori, F. Iqbal, B. C. M. Fung, S. H. H. Ding, and A. Tubaishat. Arabic authorship attribution: an extensive study on twitter posts. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(1):5.1-5.51, November 2018. [ Abstract | BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and M. Debbabi. A visualizable evidence-driven approach for authorship attribution. ACM Transactions on Information and System Security (TISSEC15), 17(3):30 pages, 2015. [ JCR impact factor: 0.759, 5-year: 1.610 | SJR impact factor: 1.772 | top journal in information security | Abstract | BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and P. Charland. Sym1n0: large-scale symbolic expression retrieval for cross-architecture assembly clone search. IEEE Transactions on Software Engineering (TSE), (14 pages):, under revision for re-submission. [ BibTex | PDF | URL ]
- S. A. Alkhodair, S. H. H. Ding, B. C. M. Fung, and J. Liu. Detecting breaking news rumors of emerging topics in social media. Elsevier Information Processing and Management (IPM), ():20 pages, under minor revision. [ BibTex | PDF | URL ]
Conference Proceedings (2 + 1 in progress)
- S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD16), 10 pages. San Francisco, CA, August 2016. ACM Press. [ acceptance ratio: 142/784 = 18% | top conference in data mining | Abstract | BibTex | Code | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th International Symposium on Security and Privacy (S&P19a), 18 pages. San Francisco, CA, May 2019. IEEE Computer Society. [ up-to-date acceptance ratio: 13/225 = 5.7% (submission opens until Feb.) | top conference in security | Abstract | BibTex | Code | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, S. McIntosh, S. M, and P. Charland. Raven: neural malware behavior characterization without sandbox. In Proceedings of the 40th International Symposium on Security and Privacy (S&P19b), 18 pages. 2019, under review. IEEE Computer Society. [ BibTex | PDF | URL ]
Invited Talks (6)
- S. H. H. Ding. Kam1n0: data mining and machine learning for reverse engineering. Hong Kong University of Science and Technology, Hong Kong, China, Oct. 2018. [ BibTex | PDF | URL ]
- S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Google Montreal, Montreal, Canada, Nov. 2016. [ BibTex | PDF | URL ]
- S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. EBTIC Research Centre established by British Telecommunications (BT), Abu Dhabi, UAE, Sept. 2016. [ BibTex | PDF | URL ]
- S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Above Security Montreal, Montreal, Canada, Jul. 2016. [ BibTex | PDF | URL ]
- S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. ESET Montreal, Montreal, Canada, Jun. 2016. [ BibTex | PDF | URL ]
- S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Sophos Vancouver, Vancouver, Canada, Apr. 2016. [ BibTex | PDF | URL ]
Conference Presentations (3)
- S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. The 40th International Symposium on Security and Privacy (S&P19), Jun. 2019. [ BibTex | PDF | URL ]
- S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. The Smart Cybersecurity Network (SERENE-RISC) Research Showcase, rapid-fire presentation and poster presentation, best poster award, Nov. 2016. [ BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, S. M, and P. Charland. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Jun. 2016. [ span class='cit'>The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD16), poster presentatio | BibTex | PDF | URL ]
Technical Reports (8)
- S. H. H. Ding, M. Q. Li, and B. C. M. Fung. Assembly code data mining: design and prototyping (part 7). Technical Report W7701-155902/001/QCL (TA07), Defence Research and Development Canada (DRDC), Mar. 2018. [ BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, M. Q. Li, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 6). Technical Report W7701-155902/001/QCL (TA06), Defence Research and Development Canada (DRDC), Jan. 2018. [ BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 5). Technical Report W7701-155902/001/QCL (TA05), Defence Research and Development Canada (DRDC), Feb. 2017. [ BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 4). Technical Report W7701-155902/001/QCL (TA04), Defence Research and Development Canada (DRDC), Jan. 2017. [ BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and K. Zhao. Assembly code data mining: design and prototyping (part 3). Technical Report W7701-155902/001/QCL (TA02), Defence Research and Development Canada (DRDC), Mar. 2016. [ BibTex | PDF | URL ]
- S. H. H. Ding, B. C. M. Fung, and K. Zhao. Assembly code data mining: porting to other platforms. Technical Report W7701-155902/001/QCL (TA03), Defence Research and Development Canada (DRDC), Feb. 2016. [ BibTex | PDF | URL ]
- B. C. M. Fung, S. H. H. Ding, and K. Zhao. Assembly code data mining: design and prototyping (part 2). Technical Report W7701-155902/001/QCL (TA01), Defence Research and Development Canada (DRDC), Jan. 2016. [ BibTex | PDF | URL ]
- B. C. M. Fung, S. H. H. Ding, M. R. Farhadi, and K. Zhao. Assembly code data mining: design and prototyping (part 1). Technical Report W7701-155902/001/QCL (Firm Portion), Defence Research and Development Canada (DRDC), Mar. 2015. [ BibTex | PDF | URL ]
Our Courses
Recent Blogs
Browse our recent technical blogs.
Our Sponsors
Get In Touch
Schedule - Make an appointment HERE
Contact Details
Kingston, Ontario, Canada K7L 2N8
Kingston, Ontario, Canada K7L 2N8