- Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: AI Evaluation Data Scientist (Fixed term contract).
Urgent! AI Evaluation Data Scientist (Fixed-term contract) Job Opening In Barcelona – Now Hiring Multiverse Computing LLC
We are looking to fill this role
immediately
and are reviewing applications daily.
Expect a fast, transparent process with quick feedback.
Why join us?
We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support.
Our groundbreaking technology is already transforming how AI is deployed worldwide — compressing large language models by up to 95% without losing accuracy and cutting inference costs by 50–80%.
Joining us means working on cutting-edge solutions that make AI faster, greener, and more accessible — and being part of a company often described as a “quantum-AI unicorn in the making.”
We offer Competitive annual salary Two unique bonuses: signing bonus at incorporation and retention bonus at contract completion.
Relocation package (if applicable).
Fixed-term contract ending in June 2026.
Hybrid role and flexible working hours.
Be part of a fast-scaling Series B company at the forefront of deep tech.
Equal pay guaranteed.
International exposure in a multicultural, cutting-edge environment.
Job Overview We are seeking a skilled and experienced AI Evaluation Data Scientist
with a strong technical background in Generative AI to join our team.
In this role you will have the opportunity to lead the design and implementation of evaluation frameworks to assess the performance of Generative AI systems before deployment in production, as well as working closely with cross-functional teams to turn outcomes into actionable insights to be integrated into our products.
You will have the opportunity to work on challenging projects and shape the future of Generative AI systems.
As an AI Evaluation Data Scientist, you will Design and lead the evaluation strategy for our Agentic AI and RAG systems, turning customer workflows and business needs into measurable metrics and clear success criteria.
Contribute to the end-to-end design of Agentic AI and RAG systems, injecting a data-and-evaluation perspective into retrieval strategies, orchestration policies, tool usage, and memory to solve complex, real-world problems across industries.
Develop task-based, multi-step evaluations that reflect how the different components of our systems (retrieval, planning, tool use, memory) perform in real-world scenarios across cloud and edge deployments.
Develop and refine rigorous evaluation frameworks that reflect real-world performance, going beyond model benchmarks to assess task success, reasoning capabilities, factual consistency, reliability, and user success metrics across diverse problem domains.
Build and maintain a reproducible evaluation pipeline, including datasets, scenarios, configs, test suites, versioned assets, and automated runs to track regressions and improvements over time.
Curate and generate high-quality datasets for evaluation, including synthetic and adversarial data, to strengthen coverage and robustness.
Implement and calibrate LLM-as-a-judge evaluations, aligning automated scoring with human feedback and ensuring fairness, robustness, and representativeness.
Perform deep error analyses and ablations to uncover failure patterns, maintain a taxonomy of failure modes (reasoning, grounding, hallucinations, tool failures), and provide actionable insights to engineers to improve model and system performance.
Partner with ML specialists to create a data flywheel, where evaluation continuously informs new dataset creation, improvements on prompts, tool usage, model training, and system refinements, quantifying improvements over time.
Define and monitor operational metrics (latency, cost, reliability) to ensure evaluations align with production and customer expectations.
Maintain high engineering standards, including clear documentation, reproducible experiments, robust version control, and well-structured ML pipelines.
Contribute to team learning and mentorship, guiding junior engineers and sharing expertise in LLM development, evaluation, and deployment best practices.
Participate in code reviews, offering thoughtful, constructive feedback to maintain code quality, readability, and consistency.
Required minimum Qualifications Master's or Ph.D. in Computer Science, Machine Learning, Data Science, Physics, Engineering, or related technical fields, with relevant industry experience.
Solid hands-on experience (3+ years for mid-level, 5+ years for senior) working as a Data Scientist, ML Engineer, or Research Scientist in applied AI/ML projects deployed in production environments.
Strong background in evaluation of machine learning systems, ideally with experience in LLMs, RAG pipelines, or multi-agent systems.
Proven ability to design and implement evaluation methodologies that go beyond static benchmarks, capturing real-world task success, reasoning, and robustness.
Hands-on experience with dataset creation and curation (including synthetic data generation) for training and evaluation.
Proven experience with agent-based architectures (task decomposition, tool use, reasoning workflows), RAG architectures (retrievers, vector databases, rerankers), and orchestration frameworks (LangGraph, LlamaIndex).
Strong problem-solving skills, with the ability to navigate ambiguity and design practical solutions to open-ended user or business needs.
Strong software engineering skills, with proficiency in Python, Docker, Git, and experience building robust, modular, and scalable ML codebases.
Familiarity with common ML and data libraries and frameworks (e.g., PyTorch, HuggingFace, LangGraph, LlamaIndex, Pandas, etc.).
Experience with cloud platforms (ideally AWS).
Excellent communication skills, with the ability to work collaboratively in a team environment, document and explain design decisions, experimental results, and communicate complex ideas effectively.
Fluent in English.
Preferred Qualifications Ph.D. in Computer Science, Machine Learning, Data Science, Physics, Engineering, or related technical fields, with relevant industry experience.
Experience designing and running evaluation frameworks for agentic AI systems, RAG pipelines, or multi-agent orchestration.
Demonstrated experience with synthetic data generation (e.g., using LLMs to bootstrap datasets), data augmentation, and adversarial testing.
Strong background in error analysis of LLMs (hallucinations, grounding issues, tool failures, reasoning gaps) and in translating insights into concrete engineering improvements.
Track record of open-source contributions, publications, or public talks in the area of LLM evaluation, benchmarking, or applied AI systems.
Fluent in Spanish.
About Multiverse Computing Founded in 2019, we are a well-funded, fast-growing deep-tech company with a team of 180+ employees worldwide.
Recognized by CB Insights (2023 & 2025) as one of the
Top 100 most promising AI companies globally , we are also the largest quantum software company in the EU.
Our flagship products address critical industry needs:
CompactifAI → a groundbreaking compression tool for foundational AI models, reducing their size by up to 95% while maintaining accuracy, enabling portability across devices from cloud to mobile and beyond.
Singularity → a quantum and quantum-inspired optimization platform used by blue-chip companies in finance, energy, and manufacturing to solve complex challenges with immediate performance gains.
You’ll be working alongside world-leading experts in quantum computing and AI, developing solutions that deliver real-world impact for global clients.
We are committed to an inclusive, ethics-driven culture that values sustainability, diversity, and collaboration — a place where passionate people can grow and thrive.
Come and join us! As an equal opportunity employer, Multiverse Computing is committed to building an inclusive workplace.
The company welcomes people from all
different backgrounds, including age, citizenship, ethnic and racial origins, gender identities, individuals with disabilities, marital status, religions and ideologies, and sexual orientations to apply.
#J-18808-Ljbffr
✨ Smart • Intelligent • Private • Secure
Practice for Any Interview Q&A (AI Enabled)
Predict interview Q&A (AI Supported)
Mock interview trainer (AI Supported)
Ace behavioral interviews (AI Powered)
Record interview questions (Confidential)
Master your interviews
Track your answers (Confidential)
Schedule your applications (Confidential)
Create perfect cover letters (AI Supported)
Analyze your resume (NLP Supported)
ATS compatibility check (AI Supported)
Optimize your applications (AI Supported)
O*NET Supported
O*NET Supported
O*NET Supported
O*NET Supported
O*NET Supported
European Union Recommended
Institution Recommended
Institution Recommended
Researcher Recommended
IT Savvy Recommended
Trades Recommended
O*NET Supported
Artist Recommended
Researchers Recommended
Create your account
Access your account
Create your professional profile
Preview your profile
Your saved opportunities
Reviews you've given
Companies you follow
Discover employers
O*NET Supported
Common questions answered
Help for job seekers
How matching works
Customized job suggestions
Fast application process
Manage alert settings
Understanding alerts
How we match resumes
Professional branding guide
Increase your visibility
Get verified status
Learn about our AI
How ATS ranks you
AI-powered matching
Join thousands of professionals who've advanced their careers with our platform
Unlock Your AI Evaluation Potential: Insight & Career Growth Guide
Real-time AI Evaluation Jobs Trends in Barcelona, Spain (Graphical Representation)
Explore profound insights with Expertini's real-time, in-depth analysis, showcased through the graph below. This graph displays the job market trends for AI Evaluation in Barcelona, Spain using a bar chart to represent the number of jobs available and a trend line to illustrate the trend over time. Specifically, the graph shows 9469 jobs in Spain and 1857 jobs in Barcelona. This comprehensive analysis highlights market share and opportunities for professionals in AI Evaluation roles. These dynamic trends provide a better understanding of the job market landscape in these regions.
Great news! Multiverse Computing LLC is currently hiring and seeking a AI Evaluation Data Scientist (Fixed term contract) to join their team. Feel free to download the job details.
Wait no longer! Are you also interested in exploring similar jobs? Search now: AI Evaluation Data Scientist (Fixed term contract) Jobs Barcelona.
An organization's rules and standards set how people should be treated in the office and how different situations should be handled. The work culture at Multiverse Computing LLC adheres to the cultural norms as outlined by Expertini.
The fundamental ethical values are:The average salary range for a AI Evaluation Data Scientist (Fixed term contract) Jobs Spain varies, but the pay scale is rated "Standard" in Barcelona. Salary levels may vary depending on your industry, experience, and skills. It's essential to research and negotiate effectively. We advise reading the full job specification before proceeding with the application to understand the salary package.
Key qualifications for AI Evaluation Data Scientist (Fixed term contract) typically include Otros and a list of qualifications and expertise as mentioned in the job specification. Be sure to check the specific job listing for detailed requirements and qualifications.
To improve your chances of getting hired for AI Evaluation Data Scientist (Fixed term contract), consider enhancing your skills. Check your CV/Résumé Score with our free Resume Scoring Tool. We have an in-built Resume Scoring tool that gives you the matching score for each job based on your CV/Résumé once it is uploaded. This can help you align your CV/Résumé according to the job requirements and enhance your skills if needed.
Here are some tips to help you prepare for and ace your job interview:
Before the Interview:To prepare for your AI Evaluation Data Scientist (Fixed term contract) interview at Multiverse Computing LLC, research the company, understand the job requirements, and practice common interview questions.
Highlight your leadership skills, achievements, and strategic thinking abilities. Be prepared to discuss your experience with HR, including your approach to meeting targets as a team player. Additionally, review the Multiverse Computing LLC's products or services and be prepared to discuss how you can contribute to their success.
By following these tips, you can increase your chances of making a positive impression and landing the job!
Setting up job alerts for AI Evaluation Data Scientist (Fixed term contract) is easy with Spain Jobs Expertini. Simply visit our job alerts page here, enter your preferred job title and location, and choose how often you want to receive notifications. You'll get the latest job openings sent directly to your email for FREE!