Amazon Omics aims to optimize biological data analysis at scale

At its annual re:Invent conference, Amazon Web Services on Tuesday launched a new service, dubbed Amazon Omics, designed to help bioinformaticians, researchers and scientists store and analyze genomic and other biological data types to accelerate scientific advances for precision medicine.

Omics typically refers to fields of study in biology that end with the suffix “omics,” such as genomics, transcriptomics (the study of RNA in a cell), proteomics (the study of proteomes, or sets of proteins) and metabolomics (the study of molecules within cells). Omics typically involve large-scale studies with big data sets.  

The new service, according to the company, can be used by scientists to not only create a huge data store but also import large raw data files such as genome sequences or other data files used in precision medicine—a medical field that uses genome and protein data to optimize treatment for diseases.

Amazon Omics can also help set up basic bioinformatics workflow and analyze results using existing AWS analytics and machine learning services, AWS said, adding that the service automatically provisions the underlying infrastructure as usage grows.

Data storage optimized for bioinformatics

The new service functions on the basis of three primary components—optimized storage, managed compute for workflows and data stores geared for specific types of analytics, Channy Yun, principal developer advocate at Amazon, wrote in a blog post. 

In order to lower costs, Amazon Omics uses bioinformatics-aware storage options for storing raw sequence data. In order to optimize data for running analysis, Amazon Omics imports raw data into a variant store and transforms it into a query-ready schema that is available as an Apache Iceberg Table, according to the company.

The service comes with two storage classes—active and archive.

“Auto-archival is on by default, meaning that Amazon Omics will automatically move data to the cheaper storage class if they are not regularly accessed (for more than 30 days), similar to the Amazon Simple Storage Service (Amazon S3) Intelligent-Tiering storage class, leading to cost savings for customers,” Tehsin Syed, general manager of Health AI at AWS, wrote in a blog post.

Amazon Omics also supports the import of raw data into an Annotation Store. Data that is marked or tagged by file types is called annotated data.

Scientists and other users can start importing data into the object storage via the service’s console.

The managed compute component of the service provides resources to scientists to run bioinformatics workflows that contain scripts of a series of coordinated tasks designed to distill large amounts of raw sequence data, from Amazon Omics storage or Amazon S3, to small amounts of analytic data, such as genome mutations, the company said, adding that scientists and other users needs to just specify the compute resources needed for each task.

“In turn, this removes all the undifferentiated heavy lifting associated with running and managing these workflows at scale,” Syed wrote, adding that the scripts inside workflows can be written in languages such as Nextflow or Workflow Description Language.

The new service, which can be used in combination with other services such as Amazon HealthLake, is now available in the US East (North Virginia), US West (Oregon), Asia Pacific (Singapore), Europe (Frankfurt), Europe (Ireland), and Europe (London) regions.

Support for more regions is expected to follow soon. The service is priced on a consumption model.

Copyright © 2022 IDG Communications, Inc.

Leave a Reply