L1NNA Research Laboratory

We research AI for security and security for AI.

Recruitment

At L1NNA, we believe that AI techniques will drive the next-generation security solutions and transform our cybersecurity landscape in a dramatic way. Keeping this in mind, we advance AI to empower human, and secure AI for reliability and explainability. Our research portfolio bridges the area of machine learning, data mining, artificial intelligence and cybersecurity. We mainly focus on the following four dimensions

Protection

Securing the future with AI. Protect your organization against intrusion, malware, spams, zero-day vulnerabilities, and other cyber attacks. Adapt your defense to the constantly changing attack behaviors.

Investigation

Empowering cyber-crime investigators, malware analysts, and reverse engineers with the latest possibility of AI, to understand the provenance and the inner-working of a specific piece of artifact.

Reliability

Promoting the reliability of AI against perturbation, poisoning and adversarial samples by studying the evasive and defensive techniques. Explore the possibility of AI specification and verification.

Explainability

Explaining the unknown. Visualizing the AI's decision making process by post-hoc explanation and build-in transparency in AI. Facilitate the interaction between human and AI systems in the future.

L1NNA research laboratory is located within the School of Computing, at Queen's University in Kingston, Ontario, Canada. Queen’s University is a community, 175 years of tradition, academic excellence, research, and beautiful waterfront campus made of limestone buildings and modern facilities.

At L1NNA, We design novel data mining and machine learning models and develop them as practical software applications that address critical issues in cybersecurity. Our research has attracted industrial interests and has an ongoing close collaboration DRDC. We focus on both the research and the development aspects of these systems.

Research

Understanding a problem's practical challenges, and connect them with the theoretical research gaps.

Development

Putting our research outcome into practice. Develop scalable ready-to-use tools and systems for various problems.

Training

Connecting industrial needs to our research and teaching programs. Preparing HQP for the evolving job market.

News

  • 2022-2023, DRDC IDEaS 1M (33%)
  • 2021-2026, DRDC Contract 1.5M (50%)
  • 2021-2024, NSERC Alliance 1.24M (50%)
  • 2021-Aug, DRDC Research Contract 75K
  • 2020-Dec, DRDC IDEaS Grant 200K
  • 2020-Aug, DRDC Research Contract 50K
  • 2020-Apr, NSERC Discovery (2025) 157K
  • 2019-Nov, FAS Infrastructure support ($200K)
  • 2019-Sep, Nvidia GPU donation ($6K in-kind)
  • 2019-Jul, Research contract ($60K) from DRDC
  • 2019-Mar, Laboratory established
  • 2019-Mar, RIG grant ($60K) from SOC

Current Team Members

Meghna

Steven H. H. Ding

Assistant Professor, Lab Director/Primary Contact

Machine Learning, Data Mining, Cybersecurity

PhD, McGill University

Meghna

Li Tao Li

Ph.D. student

Machine Learning, Data Mining, Cybersecurity

Meghna

Jun Hong Peng

Master's Student

Cryptography, Reinforcement Learning

BSc, University of California Santa Barbara

Meghna

Weihan Ou

Ph.D. student

Machine Learning, Data Mining, Cybersecurity

MSc, University College London

Meghna

Leo Song

Ph.D. student

Deep Learning, Cybersecurity

Meghna

Christopher Molloy

Master's Student

Adversarial Learning, Cybersecurity

Meghna

Michael Wrana

Master's Student

Cybersecurity, Machine Learning

Meghna

Ziad K. Mansour

Master's Student

Machine Learning, Data Mining, Cybersecurity

BE, Cairo University

Meghna

Mark M. Adams

Ph.D. Student

Reinforcement Learning, Data Mining, Cybersecurity

MASc, University of British Columbia

Meghna

Jeremy Banks

Master's Student

Machine Learning, Cybersecurity, Insider Threats

Meghna

Christina Hu

Master's Student

Machine Learning, Natural Language Processing, Data Mining

Undergraduate, Queen's University

Meghna

Ryan H. Kerr

Master's Student

Low-Level Systems, Compilers, Cybersecurity

BCmpH, Queen's University

Meghna

Caleb Z. W. Fu

Undergraduate Student, Lab member.

Machine Learning, Cybersecurity

B.S., Queen's University

Meghna

Sarah Labrosse

Research Member

Cybersecurity, Artificial Intelligence

Meghna

Daniel Fichtinger

Research Member

Cybersecurity

2019-2020 Congwei Chen Undergraduate @ Queen's TDSC under review MSc @ UoT
2020-2021 Rylen Sampson Undergraduate @ Queen's NSERC Research Award MSc @ SFU
2019-2021 Scarlett Taviss Queen's University - Master's Student Asm2Seq - Explainable Assembly CodeFunctional Summary Generation Publication under review Security Consultant - IBM
2019-2021 Haohan Bo McGill University - Master's Student Differentially Private Text Generation for Authorship Anonymization NAACL 2021 Software Engineer - Microsoft
2017-2018 Sarah Alkhodair Concordia University - Doctoral Student Assessing Trust and Veracity of Big Data in Social Media Publication: IPM18, ICML19 Assistant Professor, King Saud University, Saudi Arabia
2017-2019 Arnaud Massenet McGill University - Undergraduate Student Stylometric Analysis for Authorship Analysis Software: StyloTensor - Automated Benchmark of Machine Learning Models for Authorship Analysis Graduate student at Oxford University

Sum Ergo Computo; I am therefore I compute.

Recruitment: PhD/MSc Students in Machine Learning and Cybersecurity

We are looking for self-motivated graduate students with passion in machine learning or cybersecurity. The ideal condidate will possess good writing skills, proficiency in programming, and be familiar with Java/C#/Python/TypeScript. Knowledge of TensorFlow/Spark/NoSQL/React+Django/Docker/Kubernetes will be a big plus!

All research-based graduate students will be fully funded.

Minimum English requirement [for international students]: TOEFL: 88 or IELTS: 7

Start date: flexible

Interested? Shoot your CV over here!

Current Projects

Assembly Code Data Mining

Assembly clone search greatly reduces the manual effort of reverse engineering since it can identify the cloned parts that have been previously analyzed. By closely collaborating with reverse engineers, I studied the challenges, designed and implemented an award-winning clone search engine called Kam1n0. It also includes specialized techniques that can mitigate the variance introduced by different processor families, different compilers, optimization techniques, and binary protection techniques. Kam1n0 has been presented at the Smart Cybersecurity Network Canada (SERENE-RISC), SOPHOS, ESET, Above Security, and Google.

Read more

Malware Phenotype Decomposition

Malware Phenotypes denote observable characteristics of individual malware such as specific observal patterns in code, string, header, and api, etc. Given a piece of unknown malware, the proposed system is able to quickly decompose the malware into existing known components or malware families based on the observable characterstics in code, string, header, api, and import modules, etc. These characteristics provide malware reverse engineers a natural and intuitive information to distinguish one malware family from the others. The proposed research under this direction aims at developing a scalable reality malware Phenotype retrieval system for security analysts. This problem is challenging given the time constrains as well as the number of malware being collected perday. In large organizations or security companies, this number can exceed 300,000 unique samples per day. Having an accurate matching system, scalable and efficient index, and optimized storage requirements is the key to have a practical and successful system for this problem.

Read more

Software Ecosystem Genetic Analysis

In the context of cybersecurity, the key to understanding the massive binary executables communicated over the network and the running processes on every single IoT device is to understand the actual machine instructions that are being executed. All the machine instructions from software are generated by the human-written code. Resembling a biological evolutionary process, software is not created from scratch but evolves over time. Studies have shown that 50% of all the files across open source projects have been reused at least twice, and more than 50% of the developers modify the components before reusing them. Software ecosystem is highly complex with interconnected information including source code, binary executables, documentation, comments, descriptions, vulnerabilities, and specifications, etc. All these interconnected entities form a heterogeneous information network. The ultimate task is to learn a robust latent vector representation for every defined entity in this network. The learned vector captures the semantic as well as the syntactic relationship between entities of different granularity to their neighbors. The learned representation enables us to match similar entities and observe the hierarchical relationship among entities on different granularity. They directly contribute to the source code and binary code clone search problems.

Read more

Authorship Analysis

The internet provides an ideal anonymous channel for concealing computer-mediated malicious activities, as the network-based origins of critical electronic textual evidence (e.g., emails, blogs, forum posts, chat logs, etc.) can be easily repudiated. Given the anonymous documents, authorship analysis is the study of identifying the actual author and his/her socio-linguistic characteristics. Many linguistic stylometric features and computational techniques have been extensively studied for this purpose. However, most of them emphasize promoting the authorship attribution accuracy, and few works have been done for the purpose of constructing and visualizing the evidential traits. I opt for an interpretable and explainable approach by which writing styles can be visualized, compared, and interpreted by an investigator like fingerprints. I also propose to integrate differential privacy and reinforcement learning to paraphrase text where writing style is sanitized.

Read more

Binary Provenance Analysis

Binary provenance denotes the characteristics of a program that derives from its path from source code to executable form. Binary provenance is important in the domain of binary forensic and performance analysis. It provides important evidential trial for cybersecurity investigators to track down the hackers behind the security accidence. For example, the Lazarus group is linked to the Wannacry incidence by code similarity. I mainly focus on two critical aspects: toolchain recovery and authorship analysis.

Read more

Neural Malware Analysis

Malware behavioral indicators denote those potentially high-risk malicious behaviors exhibited, such as unintended network communications, file encryption, keystroke logging, sandbox evasion, and camera manipulation. Generally, they are generated using sandboxes or simulators. However, the complexity of modern malware has been considerably increased. Malware is becoming sandbox-aware by incorporating modern evasive techniques. To address these issues, I propose a new neural network-based static scanner that can characterize the malicious behaviors of a given executable, without running it in a sandbox. It can be used as an additional binary analytic layer to mitigate the issues of polymorphism, metamorphism, and evasive techniques.

Read more

Highlight: KAM1N0

Open-source binary analysis and assembly code management platform.

Publications and Talks

Browse our latest publications spanning a wide range of research problems.

Journal Articles (3 + 2 in progress)

  • S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung. Learning stylometric representations for authorship analysis. IEEE Transactions on Cybernetics (CYB18), ():14 pages, 2018, in press. [ JCR impact factor: 8.803, 5-year: 8.775 | Abstract | BibTex | Code | PDF | URL ]
  • M. H. Altakrori, F. Iqbal, B. C. M. Fung, S. H. H. Ding, and A. Tubaishat. Arabic authorship attribution: an extensive study on twitter posts. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(1):5.1-5.51, November 2018. [ Abstract | BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and M. Debbabi. A visualizable evidence-driven approach for authorship attribution. ACM Transactions on Information and System Security (TISSEC15), 17(3):30 pages, 2015. [ JCR impact factor: 0.759, 5-year: 1.610 | SJR impact factor: 1.772 | top journal in information security | Abstract | BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and P. Charland. Sym1n0: large-scale symbolic expression retrieval for cross-architecture assembly clone search. IEEE Transactions on Software Engineering (TSE), (14 pages):, under revision for re-submission. [ BibTex | PDF | URL ]
  • S. A. Alkhodair, S. H. H. Ding, B. C. M. Fung, and J. Liu. Detecting breaking news rumors of emerging topics in social media. Elsevier Information Processing and Management (IPM), ():20 pages, under minor revision. [ BibTex | PDF | URL ]

Conference Proceedings (2 + 1 in progress)

  • S. H. H. Ding, B. C. M. Fung, and P. Charland. Kam1n0: mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD16), 10 pages. San Francisco, CA, August 2016. ACM Press. [ acceptance ratio: 142/784 = 18% | top conference in data mining | Abstract | BibTex | Code | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proceedings of the 40th International Symposium on Security and Privacy (S&P19a), 18 pages. San Francisco, CA, May 2019. IEEE Computer Society. [ up-to-date acceptance ratio: 13/225 = 5.7% (submission opens until Feb.) | top conference in security | Abstract | BibTex | Code | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, S. McIntosh, S. M, and P. Charland. Raven: neural malware behavior characterization without sandbox. In Proceedings of the 40th International Symposium on Security and Privacy (S&P19b), 18 pages. 2019, under review. IEEE Computer Society. [ BibTex | PDF | URL ]

Invited Talks (6)

  • S. H. H. Ding. Kam1n0: data mining and machine learning for reverse engineering. Hong Kong University of Science and Technology, Hong Kong, China, Oct. 2018. [ BibTex | PDF | URL ]
  • S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Google Montreal, Montreal, Canada, Nov. 2016. [ BibTex | PDF | URL ]
  • S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. EBTIC Research Centre established by British Telecommunications (BT), Abu Dhabi, UAE, Sept. 2016. [ BibTex | PDF | URL ]
  • S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Above Security Montreal, Montreal, Canada, Jul. 2016. [ BibTex | PDF | URL ]
  • S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. ESET Montreal, Montreal, Canada, Jun. 2016. [ BibTex | PDF | URL ]
  • S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Sophos Vancouver, Vancouver, Canada, Apr. 2016. [ BibTex | PDF | URL ]

Conference Presentations (3)

  • S. H. H. Ding, B. C. M. Fung, and P. Charland. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. The 40th International Symposium on Security and Privacy (S&P19), Jun. 2019. [ BibTex | PDF | URL ]
  • S. H. H. Ding. Kam1n0: mapreduce-based assembly clone search for reverse engineering. The Smart Cybersecurity Network (SERENE-RISC) Research Showcase, rapid-fire presentation and poster presentation, best poster award, Nov. 2016. [ BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, S. M, and P. Charland. Kam1n0: mapreduce-based assembly clone search for reverse engineering. Jun. 2016. [ span class='cit'>The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD16), poster presentatio | BibTex | PDF | URL ]

Technical Reports (8)

  • S. H. H. Ding, M. Q. Li, and B. C. M. Fung. Assembly code data mining: design and prototyping (part 7). Technical Report W7701-155902/001/QCL (TA07), Defence Research and Development Canada (DRDC), Mar. 2018. [ BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, M. Q. Li, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 6). Technical Report W7701-155902/001/QCL (TA06), Defence Research and Development Canada (DRDC), Jan. 2018. [ BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 5). Technical Report W7701-155902/001/QCL (TA05), Defence Research and Development Canada (DRDC), Feb. 2017. [ BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and M. R. Farhadi. Assembly code data mining: design and prototyping (part 4). Technical Report W7701-155902/001/QCL (TA04), Defence Research and Development Canada (DRDC), Jan. 2017. [ BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and K. Zhao. Assembly code data mining: design and prototyping (part 3). Technical Report W7701-155902/001/QCL (TA02), Defence Research and Development Canada (DRDC), Mar. 2016. [ BibTex | PDF | URL ]
  • S. H. H. Ding, B. C. M. Fung, and K. Zhao. Assembly code data mining: porting to other platforms. Technical Report W7701-155902/001/QCL (TA03), Defence Research and Development Canada (DRDC), Feb. 2016. [ BibTex | PDF | URL ]
  • B. C. M. Fung, S. H. H. Ding, and K. Zhao. Assembly code data mining: design and prototyping (part 2). Technical Report W7701-155902/001/QCL (TA01), Defence Research and Development Canada (DRDC), Jan. 2016. [ BibTex | PDF | URL ]
  • B. C. M. Fung, S. H. H. Ding, M. R. Farhadi, and K. Zhao. Assembly code data mining: design and prototyping (part 1). Technical Report W7701-155902/001/QCL (Firm Portion), Defence Research and Development Canada (DRDC), Mar. 2015. [ BibTex | PDF | URL ]

Our Courses

CISC 372 Advanced Data Analytics

2019-winter

Queen's University

CISC 499 Undergraduate Projects

2020-winter

Queen's University

CISC 873 Data Mining

2020-fall

Queen's University

CISC/CMPE 327 Software Quality Assurance

2020-fall

Queen's University

GLIS 630 Data Mining

2018-winter

McGill University

Recent Blogs

Browse our recent technical blogs.

Deep Learning Server Budget Build

Building a DL server on budget

R-ggplot Notes

ggplot quick reference notes

Sklean+Xgboost Cross Validation with Grid Search Tuning

Xgboost with Sklean with randomized parameter search

IDAAPI Notes

IDAAPI quick reference notes

IEEE Xplore PDF Check - Missing Fonts

Embedding fonts using preflight.

Get In Touch

Schedule - Make an appointment HERE

Contact Details

531 Goodwin Hall, Queen's University
     Kingston, Ontario, Canada K7L 2N8
531 Goodwin Hall, Queen's University
     Kingston, Ontario, Canada K7L 2N8
Fax: available upon request