Career Profile

Incisive and resourceful Data Scientist with expertise in Machine Learning (ML) and Natural Language Processing (NLP), specializing in Large Language Models. Adept at visual storytelling and communicating complex information to diverse audiences. My professional journey spans both academic and industry settings, contributing to projects in both private and public sectors.

Currently working as an AI Lead at Kinetic; I have previously held positions as a Marie Skłodowska-Curie Fellow at University College London, where I earned my Ph.D., and as a Machine Learning Researcher at Queen Mary University of London. I also hold an M.Sc. in Information Security and an M.Eng. in Software Engineering.

Experience

Senior Data Scientist

Jul 2021 - Feb 2024

  • Spearheaded the development of a Retrieval-Augmented Generation (RAG) application using Large Language Models (LLMs) from inception to production for a major bank. Conducted rigorous investigation into the efficacy of several GenAI techniques and tools (e.g., chains, agents, vector databases) to determine the optimal architecture. Designed and implemented an evaluation framework for said application. Consistently followed the latest advancements in GenAI, including reading relevant academic papers, and provided weekly updates on key developments to stakeholders.
  • Designed and implemented a prototype solution to address common bottlenecks in Document Understanding projects. Led a small team of Data Scientists and Data Engineers. Leveraged Synthetic Data generation methods to augment limited real-world labeled data.
  • Designed and implemented Machine Learning (ML) and Natural Language Processing (NLP) models for the clients of a global Robotic Process Automation (RPA) company (fintech, logistics, filtration).
  • Designed and implemented a methodology for de-listing and replacing products for a leading UK-based grocery chain through basket analysis and customer behavior analytics.
  • Conducted interviews for prospective data scientist hires.
  • Guided and supported the professional development of junior data scientists.

Generative AI (GenAI) Large Language Models (LLMs) Retrieval-Augmented Generation (RAG) Natural Language Processing (NLP) Machine Learning (ML)

Data Scientist

Jan 2021 - Jun 2021

  • Obtained security clearance to work on two government client projects, overseeing the collection, cleansing, and analysis of large-scale social platform data.
  • Developed the project's classification pipeline, utilizing technologies such as BERT, XGBoost, Random Forest, and Feature Engineering.
  • Visualized data and extracted actionable insights using Plotly's Dash for various stakeholders.
  • Maintained data quality by building pipelines that ensured the accuracy, completeness, and consistency of deliverables.
  • Utilized AWS services (S3, Athena) for data storage, management, and extraction.
  • Collaborated with the grant proposal writing team, leveraging academic expertise to secure funding.
  • Advised stakeholders on extremist speech detection, social platform data analysis, and far-right ideologies.
  • Mentored and coached a Junior Data Scientist.

Natural Language Processing (NLP) Machine Learning (ML) Data Visualization Grant Proposal Writing Amazon Web Services (AWS)

Machine Learning Researcher

Sep 2019 - Aug 2020

  • Trained, evaluated, and optimized the performance of various Machine Learning and Natural Language Processing models (including XGBoost, Random Forests, BERT, and Logistic Regression) for the identification of online hate speech.
  • Investigated the impact of engineered features on the accuracy and efficiency of hate speech classifiers, driving improvements in model performance.
  • Assessed the influence of different dataset annotation techniques on the performance and reliability of online hate speech classifiers, contributing to the development of more robust detection methods.

Natural Language Processing (NLP) Machine Learning (ML) Microsoft Azure Academic Research

Marie Skłodowska-Curie Fellow

Aug 2016 - Jul 2019

  • Awarded a prestigious Horizon 2020 Marie Skłodowska-Curie Fellowship as part of the Privacy & Usability Innovative Training Network.
  • Employed Machine Learning and Natural Language Processing techniques (Word Embeddings, LDA, Sentiment Analysis, Computer Vision) to investigate the use of direct-to-consumer genetic testing by far-right groups for promoting racist ideologies.
    Media Coverage The Times, StatNews
  • Conducted three large-scale studies on social media platform datasets (Twitter, Reddit, 4chan), analyzing over 1.3 million comments. Leveraged NoSQL (MongoDB) for data storage, management, and transformation.
  • Performed a critical evaluation and synthesis of research within the genome privacy community, focusing on privacy-enhancing technologies for testing, storing, and sharing genomic data.

Natural Language Processing (NLP) Machine Learning (ML) NoSQL Academic Research Privacy

Visiting Researcher

Sep 2017 - Dec 2017

  • Designed and ran a survey on the public perceptions of direct-to-consumer genetic testing.

Qualitative Research Survey Design

  • Studied how health and genetic data is being handled under GDPR.

GDPR

Graduate Teaching Assistant

Oct 2014 - Jun 2015
University of the Aegean

Acted as a Teaching Assistant for the following courses:

  • Database Structures I.
  • Information System Analysis and Design.
  • Comprehending Data Structures using C and/or C++.
  • Learning the Java programming language.

Teaching

Co-Founder and Author

Feb 2011 - Nov 2013

  • In 2011 I co-created OSArena.net which was at the time the largest Greek community on Open Source Operating Systems and Software, featuring news, guides, and opinion articles on Linux, Android, Hardware, Hacking, Privacy, and Security. I was an active author until November 2013.

Opinion Blogging News Reporting Open Source

Skills & Proficiency

Expertise

A non-exhaustive list of my main expertise.
Large Language Models (LLMs) Generative AI (GenAI) Natural Language Processing (NLP) Machine Learning (ML) Data Visualization

Technical Skills

A non-exhaustive list of the technical tools that I am proficient with.
 Programming Languages: Python
 Libraries: LangChain LlamaIndex HuggingFace LLMFlows Sklearn Pandas
Matplotlib Plotly
 Databases: Vector Databases NoSQL SQL
 Platforms & Tools: Google Cloud Platform (GCP) Amazon Web Services (AWS) Microsoft Azure Docker Git Bash

Soft Skills

Skills that I have gained through experience.
Team Leading Technical Communication Project Management Visual Storytelling Critical Thinking Analytical Skills Public Speaking Academic Writing

Selected Publications

Open-Source Projects

Oblivious Transfer Extensions: A concrete implementation of the IKPN03 protocol written in Java, using the SCAPI interface.
Java Cryptography
Daedalus: An academic schedule manager made for the department of Information and Communications Systems Engineering at the University of the Aegean.
Full Stack Development

Awards

Marie Skłodowska-Curie Fellow Scholarship

2016 - 2019
University College London

Received a Horizon 2020 Marie Skłodowska-Curie fellow scholarship for 3 years to investigate the societal challenges stemming from the rise of personal genomic testing. Acceptance Rate: 6%


University of the Aegean Scholarship

2015 - 2016
University of the Aegean

Received a scholarship for my M.Sc. studies at the University of the Aegean.