L1NNA Research Laboratory

We research AI for security and security for AI.

Recruitment

At L1NNA, we believe that AI techniques will drive the next-generation security solutions and transform our cybersecurity landscape in a dramatic way. Keeping this in mind, we advance AI to empower human, and secure AI for reliability and explainability. Our research portfolio bridges the area of machine learning, data mining, artificial intelligence and cybersecurity. We mainly focus on the following four dimensions

Protection

Securing the future with AI. Protect your organization against intrusion, malware, spams, zero-day vulnerabilities, and other cyber attacks. Adapt your defense to the constantly changing attack behaviors.

Investigation

Empowering cyber-crime investigators, malware analysts, and reverse engineers with the latest possibility of AI, to understand the provenance and the inner-working of a specific piece of artifact.

Reliability

Promoting the reliability of AI against perturbation, poisoning and adversarial samples by studying the evasive and defensive techniques. Explore the possibility of AI specification and verification.

Explainability

Explaining the unknown. Visualizing the AI's decision making process by post-hoc explanation and build-in transparency in AI. Facilitate the interaction between human and AI systems in the future.

L1NNA research laboratory is located within the School of Computing, at Queen's University in Kingston, Ontario, Canada. Queen’s University is a community, 175 years of tradition, academic excellence, research, and beautiful waterfront campus made of limestone buildings and modern facilities.

At L1NNA, We design novel data mining and machine learning models and develop them as practical software applications that address critical issues in cybersecurity. Our research has attracted industrial interests and has an ongoing close collaboration DRDC. We focus on both the research and the development aspects of these systems.

Research

Understanding a problem's practical challenges, and connect them with the theoretical research gaps.

Development

Putting our research outcome into practice. Develop scalable ready-to-use tools and systems for various problems.

Training

Connecting industrial needs to our research and teaching programs. Preparing HQP for the evolving job market.

News

2022-2023, DRDC IDEaS 1M (33%)
2021-2026, DRDC Contract 1.5M (50%)
2021-2024, NSERC Alliance 1.24M (50%)
2021-Aug, DRDC Research Contract 75K
2020-Dec, DRDC IDEaS Grant 200K
2020-Aug, DRDC Research Contract 50K
2020-Apr, NSERC Discovery (2025) 157K
2019-Nov, FAS Infrastructure support ($200K)
2019-Sep, Nvidia GPU donation ($6K in-kind)
2019-Jul, Research contract ($60K) from DRDC
2019-Mar, Laboratory established
2019-Mar, RIG grant ($60K) from SOC

Current Team Members

Steven H. H. Ding

Assistant Professor, Lab Director/Primary Contact

Machine Learning, Data Mining, Cybersecurity

PhD, McGill University

Li Tao Li

Ph.D. student

Machine Learning, Data Mining, Cybersecurity

Jun Hong Peng

Master's Student

Cryptography, Reinforcement Learning

BSc, University of California Santa Barbara

Weihan Ou

Ph.D. student

Machine Learning, Data Mining, Cybersecurity

MSc, University College London

Leo Song

Ph.D. student

Deep Learning, Cybersecurity

Christopher Molloy

Master's Student

Adversarial Learning, Cybersecurity

Michael Wrana

Master's Student

Cybersecurity, Machine Learning

Ziad K. Mansour

Master's Student

Machine Learning, Data Mining, Cybersecurity

BE, Cairo University

Mark M. Adams

Ph.D. Student

Reinforcement Learning, Data Mining, Cybersecurity

MASc, University of British Columbia

Jeremy Banks

Master's Student

Machine Learning, Cybersecurity, Insider Threats

Christina Hu

Master's Student

Machine Learning, Natural Language Processing, Data Mining

Undergraduate, Queen's University

Ryan H. Kerr

Master's Student

Low-Level Systems, Compilers, Cybersecurity

BCmpH, Queen's University

Caleb Z. W. Fu

Undergraduate Student, Lab member.

Machine Learning, Cybersecurity

B.S., Queen's University

Sarah Labrosse

Research Member

Cybersecurity, Artificial Intelligence

Daniel Fichtinger

Research Member

Cybersecurity

2019-2020	Congwei Chen	Undergraduate @ Queen's		TDSC under review	MSc @ UoT
2020-2021	Rylen Sampson	Undergraduate @ Queen's		NSERC Research Award	MSc @ SFU
2019-2021	Scarlett Taviss	Queen's University - Master's Student	Asm2Seq - Explainable Assembly CodeFunctional Summary Generation	Publication under review	Security Consultant - IBM
2019-2021	Haohan Bo	McGill University - Master's Student	Differentially Private Text Generation for Authorship Anonymization	NAACL 2021	Software Engineer - Microsoft
2017-2018	Sarah Alkhodair	Concordia University - Doctoral Student	Assessing Trust and Veracity of Big Data in Social Media	Publication: IPM18, ICML19	Assistant Professor, King Saud University, Saudi Arabia
2017-2019	Arnaud Massenet	McGill University - Undergraduate Student	Stylometric Analysis for Authorship Analysis	Software: StyloTensor - Automated Benchmark of Machine Learning Models for Authorship Analysis	Graduate student at Oxford University

Supervision History

Sum Ergo Computo; I am therefore I compute.

Recruitment: PhD/MSc Students in Machine Learning and Cybersecurity

We are looking for self-motivated graduate students with passion in machine learning or cybersecurity. The ideal condidate will possess good writing skills, proficiency in programming, and be familiar with Java/C#/Python/TypeScript. Knowledge of TensorFlow/Spark/NoSQL/React+Django/Docker/Kubernetes will be a big plus!

All research-based graduate students will be fully funded.

Minimum English requirement [for international students]: TOEFL: 88 or IELTS: 7

Start date: flexible

Interested? Shoot your CV over here!

Current Projects

Assembly Code Data Mining

Assembly clone search greatly reduces the manual effort of reverse engineering since it can identify the cloned parts that have been previously analyzed. By closely collaborating with reverse engineers, I studied the challenges, designed and implemented an award-winning clone search engine called Kam1n0. It also includes specialized techniques that can mitigate the variance introduced by different processor families, different compilers, optimization techniques, and binary protection techniques. Kam1n0 has been presented at the Smart Cybersecurity Network Canada (SERENE-RISC), SOPHOS, ESET, Above Security, and Google.

Malware Phenotype Decomposition

Malware Phenotypes denote observable characteristics of individual malware such as specific observal patterns in code, string, header, and api, etc. Given a piece of unknown malware, the proposed system is able to quickly decompose the malware into existing known components or malware families based on the observable characterstics in code, string, header, api, and import modules, etc. These characteristics provide malware reverse engineers a natural and intuitive information to distinguish one malware family from the others. The proposed research under this direction aims at developing a scalable reality malware Phenotype retrieval system for security analysts. This problem is challenging given the time constrains as well as the number of malware being collected perday. In large organizations or security companies, this number can exceed 300,000 unique samples per day. Having an accurate matching system, scalable and efficient index, and optimized storage requirements is the key to have a practical and successful system for this problem.

Software Ecosystem Genetic Analysis

In the context of cybersecurity, the key to understanding the massive binary executables communicated over the network and the running processes on every single IoT device is to understand the actual machine instructions that are being executed. All the machine instructions from software are generated by the human-written code. Resembling a biological evolutionary process, software is not created from scratch but evolves over time. Studies have shown that 50% of all the files across open source projects have been reused at least twice, and more than 50% of the developers modify the components before reusing them. Software ecosystem is highly complex with interconnected information including source code, binary executables, documentation, comments, descriptions, vulnerabilities, and specifications, etc. All these interconnected entities form a heterogeneous information network. The ultimate task is to learn a robust latent vector representation for every defined entity in this network. The learned vector captures the semantic as well as the syntactic relationship between entities of different granularity to their neighbors. The learned representation enables us to match similar entities and observe the hierarchical relationship among entities on different granularity. They directly contribute to the source code and binary code clone search problems.

Authorship Analysis

The internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Given the anonymous documents, authorship analysis is the study of identifying the actual author and his/her socio-linguistic characteristics. Many linguistic stylometric features and computational techniques have been extensively studied for this purpose. However, most of them emphasize promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. I opt for an interpretable and explainable approach by which writing styles can be visualized, compared, and interpreted by an investigator like fingerprints. I also propose to integrate differential privacy and reinforcement learning to paraphrase text where writing style is sanitized.

Binary Provenance Analysis

Binary provenance denotes the characteristics of a program that derives from its path from source code to executable form. Binary provenance is important in the domain of binary forensic and performance analysis. It provides important evidential trial for cybersecurity investigators to track down the hackers behind the security accidence. For example, the Lazarus group is linked to the Wannacry incidence by code similarity. I mainly focus on two critical aspects: toolchain recovery and authorship analysis.

Neural Malware Analysis

Malware behavioral indicators denote those potentially high-risk malicious behaviors exhibited, such as unintended network communications, file encryption, keystroke logging, sandbox evasion, and camera manipulation. Generally, they are generated using sandboxes or simulators. However, the complexity of modern malware has been considerably increased. Malware is becoming sandbox-aware by incorporating modern evasive techniques. To address these issues, I propose a new neural network-based static scanner that can characterize the malicious behaviors of a given executable, without running it in a sandbox. It can be used as an additional binary analytic layer to mitigate the issues of polymorphism, metamorphism, and evasive techniques.

Highlight: KAM1N0

Open-source binary analysis and assembly code management platform.

Publications and Talks

Browse our latest publications spanning a wide range of research problems.

Journal Articles (3 + 2 in progress)

S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. Learning stylometric representations for authorship analysis. IEEE Transactions on Cybernetics (CYB18), ():14 pages, 2018, in press. [ JCR impact factor: 8.803, 5-year: 8.775 | Abstract | BibTex | Code | PDF | URL ]

M. H. Altakrori, F. Iqbal, B. C. M. Fung, S. H. H. Ding, and A. Tubaishat. Arabic authorship attribution: an extensive study on twitter posts. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(1):5.1-5.51, November 2018. [ Abstract | BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and M. Debbabi. A visualizable evidence-driven approach for authorship attribution. ACM Transactions on Information and System Security (TISSEC15), 17(3):30 pages, 2015. [ JCR impact factor: 0.759, 5-year: 1.610 | SJR impact factor: 1.772 | top journal in information security | Abstract | BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and P. Charland. Sym1n0: large-scale symbolic expression retrieval for cross-architecture assembly clone search. IEEE Transactions on Software Engineering (TSE), (14 pages):, under revision for re-submission. [ BibTex | PDF | URL ]

S. A. Alkhodair, S. H. H. Ding, B. C. M. Fung, and J. Liu. Detecting breaking news rumors of emerging topics in social media. Elsevier Information Processing and Management (IPM), ():20 pages, under minor revision. [ BibTex | PDF | URL ]

Conference Proceedings (2 + 1 in progress)

S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD16), 10 pages. San Francisco, CA, August 2016. ACM Press. [ acceptance ratio: 142/784 = 18% | top conference in data mining | Abstract | BibTex | Code | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th International Symposium on Security and Privacy (S&P19a), 18 pages. San Francisco, CA, May 2019. IEEE Computer Society. [ up-to-date acceptance ratio: 13/225 = 5.7% (submission opens until Feb.) | top conference in security | Abstract | BibTex | Code | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, S. McIntosh, S. M, and P. Charland. Raven: neural malware behavior characterization without sandbox. In Proceedings of the 40th International Symposium on Security and Privacy (S&P19b), 18 pages. 2019, under review. IEEE Computer Society. [ BibTex | PDF | URL ]

Invited Talks (6)

S. H. H. Ding. Kam1n0: data mining and machine learning for reverse engineering. Hong Kong University of Science and Technology, Hong Kong, China, Oct. 2018. [ BibTex | PDF | URL ]

S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Google Montreal, Montreal, Canada, Nov. 2016. [ BibTex | PDF | URL ]

S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. EBTIC Research Centre established by British Telecommunications (BT), Abu Dhabi, UAE, Sept. 2016. [ BibTex | PDF | URL ]

S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Above Security Montreal, Montreal, Canada, Jul. 2016. [ BibTex | PDF | URL ]

S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. ESET Montreal, Montreal, Canada, Jun. 2016. [ BibTex | PDF | URL ]

S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Sophos Vancouver, Vancouver, Canada, Apr. 2016. [ BibTex | PDF | URL ]

Conference Presentations (3)

S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. The 40th International Symposium on Security and Privacy (S&P19), Jun. 2019. [ BibTex | PDF | URL ]

S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. The Smart Cybersecurity Network (SERENE-RISC) Research Showcase, rapid-fire presentation and poster presentation, best poster award, Nov. 2016. [ BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, S. M, and P. Charland. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Jun. 2016. [ span class='cit'>The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD16), poster presentatio | BibTex | PDF | URL ]

Technical Reports (8)

S. H. H. Ding, M. Q. Li, and B. C. M. Fung. Assembly code data mining: design and prototyping (part 7). Technical Report W7701-155902/001/QCL (TA07), Defence Research and Development Canada (DRDC), Mar. 2018. [ BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, M. Q. Li, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 6). Technical Report W7701-155902/001/QCL (TA06), Defence Research and Development Canada (DRDC), Jan. 2018. [ BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 5). Technical Report W7701-155902/001/QCL (TA05), Defence Research and Development Canada (DRDC), Feb. 2017. [ BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 4). Technical Report W7701-155902/001/QCL (TA04), Defence Research and Development Canada (DRDC), Jan. 2017. [ BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and K. Zhao. Assembly code data mining: design and prototyping (part 3). Technical Report W7701-155902/001/QCL (TA02), Defence Research and Development Canada (DRDC), Mar. 2016. [ BibTex | PDF | URL ]

S. H. H. Ding, B. C. M. Fung, and K. Zhao. Assembly code data mining: porting to other platforms. Technical Report W7701-155902/001/QCL (TA03), Defence Research and Development Canada (DRDC), Feb. 2016. [ BibTex | PDF | URL ]

B. C. M. Fung, S. H. H. Ding, and K. Zhao. Assembly code data mining: design and prototyping (part 2). Technical Report W7701-155902/001/QCL (TA01), Defence Research and Development Canada (DRDC), Jan. 2016. [ BibTex | PDF | URL ]

B. C. M. Fung, S. H. H. Ding, M. R. Farhadi, and K. Zhao. Assembly code data mining: design and prototyping (part 1). Technical Report W7701-155902/001/QCL (Firm Portion), Defence Research and Development Canada (DRDC), Mar. 2015. [ BibTex | PDF | URL ]