Career Profile
Pragmatic, incisive, end-to-end data scientist proficient in practical implementations of Machine Learning, Natural Language Processing, and Document Understanding models, as well as visual storytelling, and conveying information to stakeholders with or without a technical background. I have both industry and academic experience and have worked with both private and public sector clients.
At the moment, I work as a Data Scientist at PS AI Labs. In the past, I was a Marie Skłodowska-Curie Fellow at University College London from where I received my Ph.D., and an Machine Learning Researcher at Queen Mary University of London. I also hold an M.Sc. in Information Security and an M.Eng. in Software Engineering.
Experience
- Acted as a data science consultant for EMEA-based projects of a global Robotic Process Automation (RPA) company. Developed Named Entity Recognition (NER), Computer Vision (CV), and Document Understanding (DU) models for several global companies in various fields (fintech, logistics, filtration).
- Developed and implemented the methodology used for de-listing and/or replacing products for a major UK-based grocery chain by conducting basket analysis and customer behavior analytics.
- Led several data scientist interviews for new hires.
- Got security-cleared to work on two government client projects and was in charge of collecting, cleaning, and analyzing large-scale social platform data.
- Designed the project's classifying pipeline (involving among others BERT, XGBoost, Random Forest, and Feature Engineering).
- Was visualizing data and extracting actionable insights using Plotly's Dash for various stakeholders.
- Was in charge of data quality – built pipelines to ensure that deliverables are accurate, complete, and consistent.
- Used AWS to store, manage, and extract relevant datasets (S3, Athena).
- Was part of the grant proposal writing team – leveraged my academic expertise to write and win grants.
- Consulted stakeholders on extremist speech detection, social platform data analysis, and the far-right.
- Coached and mentored a Junior Data Scientist.
- Trained and evaluated the performance of several ML and NLP models (XGBoost, Random Forests, BERT, Logistic Regression) in identifying online hate speech.
- Measured and evaluated how engineered features affect the performance of hate speech classifiers.
- Measured and evaluated how different dataset annotation techniques affect the performance of online hate speech classifiers.
- Awarded a Horizon 2020 Marie Skłodowska-Curie Fellowship for the Privacy & Usability Innovative Training Network.
-
Used ML and NLP techniques (Word Embeddings, LDA, Sentiment Analysis, Computer Vision) to study how direct-to-consumer genetic testing is being used by far-right groups to promote racist ideologies.
Media Coverage The Times, StatNews - Conducted three large-scale studies on social media platform datasets (Twitter, Reddit, 4chan) consisting in total of more than 1.3M comments. Used NoSQL (MongoDB) to store, manage, and transform the datasets.
- Critically evaluated and systematized the research produced by the genome privacy community in the context of privacy-enhancing technologies geared for testing, storing, and sharing genomic data.
- Designed and ran a survey on the public perceptions of direct-to-consumer genetic testing.
Acted as a Teaching Assistant for the following courses:
- Database Structures I.
- Information System Analysis and Design.
- Comprehending Data Structures using C and/or C++.
- Learning the Java programming language.
- In 2011 I co-created OSArena.net which is at the moment the largest Greek community on Open Source Operating Systems and Software, featuring news, guides, and opinion articles on Linux, Android, Hardware, Hacking, Privacy, and Security. I was an active author until November 2013.
Skills & Proficiency
Expertise
Tools
Libraries: Sklearn Matplotlib Plotly HuggingFace NLTK PyTorch Pandas
Databases: mongoDB AWS Athena
Various: Git Linux/Bash
Soft Skills
Selected Publications
- Ella Guest, Bertie Vidgen, Mittos Alexandros, Nishanth Sastry, Gareth Tyson, Helen Margetts. An Expert Annotated Dataset for the Detection of Online Misogyny. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021 (EACL).
-
Mittos Alexandros, Savvas Zannettou, Jeremy Blackburn, and Emiliano De Cristofaro. 'And We Will Fight For Our Race!' A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. In Fourteenth International AAAI Conference on Web and Social Media, 2020 (ICWSM).
Media Coverage The Times, StatNews
Acceptance Rate: 21%
-
Mittos Alexandros, Savvas Zannettou, Jeremy Blackburn, and Emiliano De Cristofaro. Analyzing Genetic Testing Discourse on the Web Through the Lens of Twitter, Reddit, and 4chan. ACM Transactions on the Web, 2020 (TWEB).
Open-Source Projects
Machine Learning Abusive Speech Detection Feature Engineering
Java Cryptography
Full Stack Development
Awards
Received a Horizon 2020 Marie Skłodowska-Curie fellow scholarship for 3 years to investigate the societal challenges stemming from the rise of personal genomic testing. Acceptance Rate: 6%
Received a scholarship for my M.Sc. studies at the University of the Aegean.