My research aims at improving the scalability , performance, understandibility, shareability and adoption of scientific applications exploiting distributed resources, such as High Performance Computing (HPC) clusters or cloud systems, by using scientific workflows, parallel programming models and static code analysis.
This includes performance evaluation and modelling of data-intensive applications; techniques for increasing the performance of network communication and I/O operations for distributed systems; abstract models to develop and to use data-intensive methods independently of the computing resources and the middleware systems; automatic parallelisation techniques for scaling data-intensive applications; scientific workflow management systems (WMS) and data streaming workflows; search algorithms; call and control graphs; abstract syntax tree.
Research
Interest
Work
Experience
Lecturer, University of St Andrews
Febreruary 2022 - Currently
-
Towards Large-scale Cultural Analytics in the Arts and Humanities: An AHRC funded project, exploring how to make use of large-scale cultural events data for research. Starting on the 1st February 2022.
-
On-going Collaborations
-
National
-
Malcolm Atkinson & Melissa Terras, University of Edinburgh, UK
-
Michael Rovatsos, Bayes Centre, UK
-
Heidi Burdett, Lyell Centre, UK
-
Thomas Heinis, Imperial College London
-
EPCC, UK
-
National Library of Scotland (NLS), UK
-
-
International
-
Ewa Deelman & Rafael Ferreira da Silva, University of Southern California, USA
-
Oscar Corcho & Daniel Garijo, Technical University of Madrid (UPM), Spain
-
Alberto Nuñez Covarrubias, Complutense University of Madrid, Spain
-
Emilia Cambronero & Adrian Bermejo, Universidad de Castilla La Mancha, Spain
-
Christian Page, CERFACS, Toulouse, France
-
-
Assistant Professor, Heriot-Watt University
March 2021 - February 2022
-
Honorary Fellow, School of Informatics, University of Edinburgh
-
National Librarians Research Fellowship in Digital Scholarship, NLS fellowship, 2021, GBP 7,500
Vice President (Applied Research) at JP Morgan Chase Co., October 2020- March 2021
Worked on several applied research projects for JP Morgan.
Research Fellow. EPCC, University of Edinburgh
March 2018- October 2020
Opportunity Mach: Funded by the Edinburgh and South East Scotland City Region Deal 2020 -- role: create a new web-based tool based on a focused web search of text resources describing individuals (experts) and opportunities (challenge descriptions, funding programs, project ideas). The tool allows opportunity-to-expert matching and allow users to record and modify their past searches. Furthermore, it allows users to see in which searches have they appeared within each week. Users can explore the expertise of other experts by consulting their profiles and research networks.
City Region Deal Text and Data Mining: funded by the Edinburgh and South East Scotland City Region Deal 2019-2020 -- role: improving our distributed, scalable and easy-to-use text data mining tools for enabling digital humanities research using the extensive digital collection of the National Library of Scotland
Graph-Based Data Federation for Healthcare Data Science : funded by the UK Research and Innovation’s Industrial Strategy Challenge Fund, 2019 -- role: prototyping a cloud-base architecture for the knowledge driven phenotype computation framework developed within the project
A Shared Data Repository for Genomics Data In Scotland: This was a collaboration between the NES Digital Service (for which I was seconded) and the Scottish Clinical Genetics Laboratories, 2019 -- role: exploring opportunities for creating shared data repository for clinical genetics laboratories in Scotland, and scoping the architecture necessary for such repository
Living with Machines: funded by UKRI's Strategic Priorities, 2018-2023, GBP 9.2 million -- role: developing new scalable tools for complex text mining analysis by using distributed systems and NLP processing techniques
DARE- Delivering Agile Research Excellence on European e-Infrastructures--
project has been awarded in the framework of the Horizon 2020 programme and will last for three years starting January 2018. The project aims to build an unprecedented hyper-platform that will allow users to agilely and transparently handle extreme data volumes and calculations at a fully abstract level, specifically tested on seismological and climatological applications.
BioExcel: The aim of the project is establishing a Centre of Excellence for Computational Biomolecular Research. Its mission is enabling better science by improving the most popular biomolecular software and spreading best practices and expertise among the communities through consultancy and training.
ATI-SE: Creating industry impact and addressing skills gaps have been identified as key priorities for the Institute and EPCC sees a clear opportunity to engage with the Institute in this context. The University of Edinburgh’s co-joint venture partners are the Universities of Cambridge, Oxford, Warwick, University College London and the Engineering and Physical Sciences Research Council (EPSRC).
Senior Data Scientist. British Geological Survey.
October 2016 - February 2018
GeoSocial-Aurora project: The aim of the project is to explore the usefulness of social media for scientific survey and analysis with the release of GeoSocial-Aurora, a webmapping tool that searches for tweets related to aurora sightings and locates them as markers on a map. During the last years, tweets that contain “aurora borealis” terms (or
similar) have been stored in a BGS facility. However, not all these tweets are related with aurora borealis. Therefore, my role in this project is to create a Machine learning
classifier (comparing the performance with logistic regresion with neural nets) to classify in run-time the tweets before plotting then in a web-map.
Volcanology project: The aim of this project is to build a data-pipeline application for automatic information retrieval and extraction from volcanic ash advisor centers for further analysis. My role in this project is to implement a data-pipeline information extraction workflow that captures and filters the desired information and gathers them in
a easy format (json files) for analysis. Additionally, I am also in charge to create an application for performing further analysis (interpolation) on the extracted data.
Satellite Magnetic Data Selection Code for Global Modelling: The satellite-data selection-code is used to separate out those satellite magnetic-data that we wish to use in their global modelling work from the large quantity of available data. Satellite data are available from several sources including the Oersted, CHAMP, and the ESA Swarm missions. Taken together, these data span almost three decades at 1 Hertz sampling frequency, a quantity that is far too large to be used in our modelling code. The data selection code allows them to reduce the amount of data and to filter out data that do not meet our chosen criteria. Therefore, my role in this project is to help to implement a new code (data-pipeline workflow), which is easier to modify and run faster in parallel, allow them to explore new data selections and ultimately improve our global magnetic field models.
Senior Research Associate in University of Edinburgh. Data Intensive Research Group.
2011 - October 2016
From Ocober 2011 to February 2014 I had been working in EFFORT project (UK/NERC). The aim of the project was to provide a facility for developing and testing models to forecast brittle failure in experimental and natural data. And I was in charge of developing different solutions for: data transfer, data formats, data storage and data access. As a result, I built a new science gateway for Rock Physicists and Volcanologitsts, where scientists can access to experimental data and apply their models to them, and visualize their results at run time through the website. I am also contributed in develop a new python library, VarPy, for analyzing seismic data.
From February 2011 to June 2015, I was working for the VERCE project (funded by EU/FP7), which aims are to study and to develop a working framework for running data and computationally intensive applications in the seismology domain. In this project, we have developed a new workflow system called dispel4py, which is a new Python library used to describe abstract workflows for distributed data-intensive applications. My role in VERCE, was to optimize the workflows developed with dispel4py in run-time, applying different adaptive techniques and deliver various VERCE and dispel4py training events and material.
From June 2015 to October 2016 , I had a major role capturing the requirements for the ENVRIplus (funded by EU Horizon2020) project delivering common data functionalities for 20 pan-European Research Infraestructures (RIs) in the environmental domain. As a result, I have already reported a completely analysis of the useful approaches and technologies for processing at every stage of the data lifecycle (from validating, error correcting and monitoring during data acquisition to transformations for comprehensible presentations of final results) for the RIs involved in ENVRIplus.
During the participation in the DIR group I have published 25 peer-reviewed papers on HPC and eSciences topics in top journals and conferences, been invited 17 times as speaker at different conferences and departmental seminars, have been appointed external PhD examiner five times, have participated four times as a Programme Committee Member and nominated four times as a reviewer for international journals and conferences. I have been also invited as a reviewer for a H2020 project funded by the the European Commision, have invited to participate in an european project (EXDCI) as an expert in the field of "Weather, Climatology and Solid Earth Sciences". I have made four research visits to different universities, three of them funded by grants that I was awarded. The US-based company Accelogic contracted me for consultancy services to make commercial use of outputs from my previous research on real-time adaptive compression techniques.
Research and Teaching Assistant, University Carlos III Madrid. Computer Architecture, Communitcations & System group. 2005 – 2010
During the period of time that I was employed by the University Carlos III, my main aim was to improve the scalability and performance of MPI-based applications executed in clusters. This was achieved by reducing the overhead of I/O and communications subsystems. During that period a sucessfully presented my PhD thesis called “Dynamic optimization techniques to enhance scalability and performance of MPI-based applications”
Also I was important contributor in four projects co-funded by the University Carlos III of Madrid and the Spanish Government Grant.
As a teaching assistant, I have taught undergraduate courses in Master and Bachelor of Computer Science. My responsibilities included laboratory advising, grading homework and exams, and assisting students with course materials.
Awards & Grants
-
Towards Large-scale Cultural Analytics in the Arts and Humanities. UKRI/AHRC, GBP 100,000
-
National Librarians Research Fellowship in Digital Scholarship, NLS fellowship, 2021, GBP 7,500
-
Honorary Fellow, School of Informatics at the University of Edinburgh, from 2021 to 2024
-
Postdoctoral and Early Career Researcher Exchanges (PECE) SICSA Grant, 2017, GBP 7500
-
Honorary Fellow, School of Informatics at the University of Edinburgh, from 2017 to 2019
-
Postdoctoral and Early Career Researcher Exchanges (PECE) SICSA Grant, 2015, GBP 6,080
-
Travel grant 'Driving UK HPC enabled science and innovation through US collaborations" to SC14, New Orleans, 16/11/2014 - 21/11/2014, GBP 2,500
-
Best poster award, SACSIS 2012, Kobe, Japan, 16/05/2012
-
Two HPC-Europa2 grants 2011, Edinburgh, U.K, 15/02/2011 – 01/10/2011, GBP 3,300
-
Student travel award, TCPP PhD Forum at IPDPS, Miami, USA, 14/04/2008 – USD 500
-
Scholarship for Doctoral training Program in Computer Science Department, University Carlos III, Spain,01/09/2004 – 01/09/2005, EUR 12,000
Languages
Spanish
English
Technical Skills
-
GitHub repository with several codes, training and presentations.
-
Experience in MPI, OpenMP, Threads
-
Experience with Storm, Globus Online, Map-Reduce, Hadoop
-
Experience in Workflow systems
-
Familiarity with massively parallel and multicore architectures
-
Experience in data streaming transfer, store and access
-
Experience in automated job submission portlet generation systems
-
Experience in web development with frameworks such as Liferay, Sinatra, and Ruby on Rails
-
Programming Languages:C++, C, Java Scripting
-
Languages: – Bash, Python
-
Experience with HPC applications and parallel benchmarks
-
Experience in working in two multidisciplinary projects
Publication list
Education
PhD in Computer Science, University Carlos III Madrid, 14/09/ 2010
Master of Technology in Computer Science, University Carlos III Madrid, 01/07/2007
Master in eCommerce, University Carlos III Madrid, 01/07/2004
Master of Computer Science, University of Deusto, 01/06/2003
Bachelor of Computer Science, University of Deusto, 01/06/2001