Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: Data Engineer for Language Technologies (RE1).
Spain Jobs Expertini

Urgent! Data Engineer for Language Technologies (RE1) Job Opening In Barcelona – Now Hiring Barcelona Supercomputing Center

Data Engineer for Language Technologies (RE1)



Job description

Context And Mission


 
The Language Technologies Laboratory at BSC has consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains.

It has been entrusted by the Spanish and the Catalan governments with the mission to develop fundamental open- source resources and technologies for Spanish and Catalan.

In connection with this, the LT Lab is currently in charge of two flagship projects at the national and regional level: the ALIA project, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department.

In addition, the Lab participates in various EU funded international projects.

The LT Lab is looking for candidates with a background in computational linguistics with experience in Language Technologies, specifically in Deep Learning and large language model building, and possibly in other areas of Natural Language and Speech Processing.

The successful candidate will work in a highly sophisticated HPC environment, have access to state-of-the-art systems and computational infrastructures, and establish collaborations with experts in different areas at the local and international levels.

The researcher will implement innovative techniques for language modeling and evaluation in the HPC environment.

 


Este contrato se encuentra financiado por el proyecto “Despliegue de la familia de Modelos ALIA en castellano y lenguas cooficiales”, con referencia externa 2024EtL00019, promovido por la Secretaría de Estado de Digitalización e Inteligencia Artificial (SEDIA), cuyos fondos provienen del Ministerio para la Transformación Digital y de la Función Pública, financiado por la Unión Europea-NextGenerationEU».

 



 
Key Duties

 

  • Work, in collaboration with the group members, on the design and development of the solutions needed to achieve the goals of the group’s research projects.

  • Interact with relevant stakeholders of the group’s research projects to understand their problems and the available data to formulate valuable solutions.

  • Ensure the long-term acquisition, management and accessibility of language data through the design and implementation of scalable storage solutions and structured data systems, and processing tools.

  • Collaborate with the members of the group in the generation and evaluation of language models using Deep Learning techniques (Transformers, Recurrent Neural Networks, and other neural network architectures).


 
Requirements

 

  • Education

    • Degree in Applied Linguistics, Computer Science or related disciplines with a very strong linguistic background.



  • Essential Knowledge and Professional Experience

    • Native speaker of Spanish.

    • Good knowledge of Python.

    • Good knowledge of Linux.

    • Knowledge of Deep Learning.

    • Experience in Machine Learning techniques applied to NLP.

    • Experience/ knowledge in corpus annotation and generation of linguistic resources.

    • Understanding of data administration and management functions (transfer, storage, analysis, distribution, exploration, etc.)



  • Additional Knowledge and Professional Experience

    • Theoretical broad knowledge of AI techniques.

    • Knowledge of HPC workload managers such as Slurm.

    • Knowledge of Continuous Integration/Delivery/Deployment, including tools such as (or similar to) GitLab CI, Github, Docker and/or Ansible.

    • Experience in machine learning and data mining including knowledge of PyTorch, Tensorflow, OpenCV, Pandas, Scikit-learn and/or Numpy.

    • Basic Knowledge of GPU-based computing.

    • Fluency in spoken and written English.

    • Experience in web/data scraping.

    • Expertise in building and maintaining data-curation pipelines.



  • Competences

    • Capacity to explore new research lines.

    • Ability to work independently and collaboratively within multidisciplinary teams.

    • Proactive, detail-oriented mindset, capable of problem-solving in complex data contexts.

    • Good communication and presentation skills.

    • Commitment to deadlines and quality research output.




 
Conditions

 


  • The position will be located at BSC within the Life Sciences Department

  • We offer a full-time contract a good working environment, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, restaurant tickets, support to the relocation procedures

  • Duration: Until 31/03/2026

  • Holidays: 23 paid vacation days plus 24th and 31st of December per our collective agreement

  • Salary: we offer a competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona

  • Starting date: asap




 


Required Skill Profession

Computer Occupations



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your Data Engineer Potential: Insight & Career Growth Guide