Requisition Id 16566
Overview:
We are seeking a Research Software Engineer to join the Incident Modeling and Computational Sciences (IMCS) Group in the National Security Sciences Directorate at Oak Ridge National Laboratory (ORNL). IMCS develops and maintains state-of-the-art modeling and simulation tools supporting nuclear forensics, nuclear weapon effects, radiological consequence management, and other needs for DOE, DOW, and DHS sponsors. In this role, you will design, develop, and operate enterprise AI and data infrastructure, helping to build, maintain, and scale Docker-based microservices, large language model (LLM) inference servers on GPU clusters, vector database and retrieval-augmented generation (RAG) pipelines, and observability stacks that advance AI capabilities across the laboratory. The successful candidate will work independently and collaboratively with a multidisciplinary team of scientists, data engineers, and system administrators to deliver reliable, secure, and high-performance AI services to ORNL researchers.
Basic Qualifications:
- A BS degree in computer science, software engineering, or a related technical field and a minimum of five years of relevant experience. A combination of education and experience may also be considered.
- Experience with software development life cycle, including version control with Git, code review practices, and collaborative development workflows.
- Experience writing and maintaining production-quality code in Python, with exposure to one or more additional languages (e.g., JavaScript, Bash, C++).
- Experience deploying and debugging containerized applications using Docker and Docker Compose, including multi-service environments.
- Experience with Linux shell scripting in a command-line environment.
- Experience working in multi-disciplinary teams across all phases of the software development life cycle.
Preferred Qualifications:
- Experience deploying or operating AI/ML serving infrastructure, including LLM serving frameworks such as vLLM, Ollama, or similar.
- Familiarity with model routing or proxy tools such as LiteLLM or comparable API gateway solutions.
- Experience with vector databases or retrieval-augmented generation (RAG) pipelines (e.g., Milvus, ChromaDB, Weaviate, or similar).
- Knowledge of reverse proxy and web infrastructure concepts, including Nginx configuration, TLS/mTLS certificate management, WebSocket proxying, and authentication subrequests.
- Experience with relational databases, including PostgreSQL administration and schema management.
- Familiarity with observability tooling such as OpenTelemetry, Prometheus, Grafana, Loki, or Tempo.
- Experience with HPC environments and job schedulers such as SLURM, or general experience deploying services on remote GPU clusters.
- Experience maintaining forks of open-source projects, including upstream merge management, patch backporting, and dependency CVE remediation.
- Familiarity with JavaScript or TypeScript and component-based frontend frameworks such as Svelte or React.
- Excellent written and oral communication skills.
- Motivated self-starter with the ability to work independently and to participate creatively in collaborative teams across the laboratory.
- Ability to function well in a fast-paced research environment, set priorities to accomplish multiple tasks within deadlines, and adapt to ever-changing needs.
Special Requirements:
- This position requires the ability to obtain and maintain a Secret Compartmented Information (SCI) clearance from the Department of Energy. As such, this position is a Workplace Substance Abuse (WSAP) testing designated position. WSAP positions require passing a pre-placement drug test and participation in an ongoing random drug testing program. In addition, due the SCI, you may also be subject to random polygraph testing.
About ORNL:
As a U.S. Department of Energy (DOE) Office of Science national laboratory, ORNL has an impressive 80-year legacy of addressing the nation’s most pressing challenges. Our team is made up of over 7,000 dedicated and innovative individuals! Our goal is to create an environment where a variety of perspectives and backgrounds are valued, ensuring ORNL is known as a top choice for employment. These principles are essential for supporting our broader mission to drive scientific breakthroughs and translate them into solutions for energy, environmental, and security challenges facing the nation.
ORNL offers competitive pay and benefits programs to attract and retain individuals who demonstrate exceptional work behaviors. The laboratory provides a range of employee benefits, including medical and retirement plans and flexible work hours, to support the well-being of you and your family. Employee amenities such as on-site fitness, banking, and cafeteria facilities are also available for added convenience.
Other benefits include the following: Prescription Drug Plan, Dental Plan, Vision Plan, 401(k) Retirement Plan, Contributory Pension Plan, Life Insurance, Disability Benefits, Generous Vacation and Holidays, Parental Leave, Legal Insurance with Identity Theft Protection, Employee Assistance Plan, Flexible Spending Accounts, Health Savings Accounts, Wellness Programs, Educational Assistance, Relocation Assistance, and Employee Discounts.
If you have difficulty using the online application system or need an accommodation to apply due to a disability, please email: ORNLRecruiting@ornl.gov.
This position will remain open for a minimum of 5 days after which it will close when a qualified candidate is identified and/or hired.
We accept Word (.doc, .docx), Adobe (unsecured .pdf), Rich Text Format (.rtf), and HTML (.htm, .html) up to 5MB in size. Resumes from third party vendors will not be accepted; these resumes will be deleted and the candidates submitted will not be considered for employment.
ORNL is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. UT-Battelle is an E-Verify employer.