Skip to Content

What Is an ODS?

To extract and process data from multiple sources, an operational data store (ODS) acts as a temporary storage location for data processing prior to sending it to its final storage destination. Data can be stored as structured or unstructured, but it must be stored in a way that can be extracted and transformed into a format for its final data warehouse location. ODS architecture is usually built for ETL (extract, transform, and load) and ELT (extract, load, and transform) data pipelines.

What Is an ODS?

An operational data store is a centralized repository for real-time or near real-time data used for operational reporting and analysis. In large data pipelines, an ODS acts as a staging area for data formatting, deduplication, and final processing before data is sent to the data warehouse. For example, a large real estate organization might extract data from several different websites to perform analytics for their customers. During the extraction process, the data pipeline stores the extracted information in an ODS so that automated scripts can format, organize, and deduplicate the data. Once ETL processes data, it’s sent to the data warehouse where real estate applications can query it.

An ODS is used for structured and unstructured data, but it’s especially useful for data pipelines working with relational databases. The ODS might store unstructured data from files or scraped web pages, and the ETL uses it to process collected data prior to the transformation step. Without the ODS, data would be lost if formatting any records failed. Any records that fail transformation can remain in the ODS for additional processing or possibly further human review.

Purpose of an ODS

For large enterprises and machine learning applications, data is often pulled from multiple locations during ETL processing. The data pipeline might pull files from a network source, data from API endpoints, and data scraped from a web application. Scripts used to collect the data dump it into an ODS where it can be processed. The purpose of an ODS is to allow data extraction scripts to have a place to store collected information before processing.

An ODS is an important part of real-time dashboards and applications, especially when the data collected in an ODS is used in several locations. For example, the ODS contains collected data where an ETL process formats it before sending it to a data warehouse where analytics can use it for financial projections. Think of an ODS as an interim data collection service prior to data being available to end-user applications.

Benefits of ODS

Enterprise businesses need an ODS for better data processing and more efficient ETL pipelines. Because ETL scripts have a place to store data, real-time applications also have a location to pull data for quick processing, artificial intelligence calculations, and machine learning ingestion. Without an ODS, your ETL data pipelines could drop data that does not fit database constraints or cannot be processed before being stored in the data warehouse.

A few additional benefits include:

  • Convenient collection of various data sources with disparate formatting and organization
  • A full snapshot of all records collected from various sources that can be used to identify issues or reprocess data if necessary
  • Unstructured data storage capabilities for analytics and machine learning
  • Cloud ODS systems can be configured to be available to users, applications, administrators, or third-party vendors regardless of their location
  • Centralized location to collect data for all internal applications, which increases data accuracy and integrity across all your critical reporting

Implementing an ODS

Because an ODS is a part of your data pipeline and ETL processing, it should be included in your designs and data architecture. The type of data collected is a big determining factor for an ODS. Any unstructured data needs a NoSQL database. A relational database will reject data that does not conform to table constraints.

After you choose the database platform, you’ll need to decide if you want to host the ODS on premises or in the cloud. An on-premises database might be better suited for internal applications unavailable to the public, but your ETL scripts must be able to reach the database and any internal data warehouses. Cloud databases are beneficial for public cloud applications where they can be configured to connect to production cloud application databases.

Real-time applications require speed and compute power, so ensure that your database architecture has the bandwidth, compute power, memory, and storage capacity to handle large loads of data. It might make sense to do a trial run on data collection to identify the amount of storage capacity necessary, but don’t forget to allow additional storage for scalability. Snapshots might eventually be moved to another backup database or removed after the data ages and isn’t relevant anymore.

ODS vs. Data Warehouse

A data warehouse is the final destination for sanitized and formatted data. The ODS in your ETL procedures is where raw data is stored until it’s structured, deduplicated, and verified. The way you organize data and where it’s stored depends on your individual business rules. Relational databases in your data warehouse require structured data with strict rules with the way you must format it before storing it.

ODS tables are consistently updated with new data, and they can be used for real-time data processing and user applications. Structured and unstructured data can be stored in ODS tables, but many systems use unstructured data so that data collection has fewer constraints. Constraints and filtering can be applied during the import process into your data warehouse.

Queries should run from the data warehouse tables where data is much more permanent. It’s unusual to delete data from a data warehouse. You might archive it, but completely removing data is unusual. ODS data is much more volatile. Duplicate data might be removed, and any stale or corrupted data could be deleted. 

Conclusion

If you plan to collect data from various sources for your data warehouse, an ODS interim architecture is beneficial for data pipelines supporting multiple applications with different business rules. You can turn your data into structured and unstructured formats to support machine learning, querying, reporting, analytic dashboards, and any other front-end application that uses the data warehouse.

To allow for a growing database, Pure Storage cloud solutions offer support for AWS, Azure, and any other provider to connect your ODS. Your ETL procedures have fast access to scalable database services to support real-time processing and fast queries.

08/2024
Telecom Solutions from Pure Storage
The largest telcos rely on Pure Storage® for mission-critical data services and minimal energy footprint, with innovative technology across all clouds.
Resumo da solução
2 páginas
ENTRE EM CONTATO
Dúvidas ou comentários?

Tem dúvidas ou comentários sobre produtos ou certificações da Pure?  Estamos aqui para ajudar.

Agende uma demonstração

Agende uma demonstração ao vivo e veja você mesmo como a Pure pode ajudar a transformar seus dados em resultados poderosos. 

Telefone: 55-11-2844-8366

Imprensa: pr@purestorage.com

 

Sede da Pure Storage

Av. Juscelino Kubitschek, 2041

Torre B, 5º andar - Vila Olímpia

São Paulo, SP

04543-011 Brasil

info@purestorage.com

FECHAR
Seu navegador não é mais compatível.

Navegadores antigos normalmente representam riscos de segurança. Para oferecer a melhor experiência possível ao usar nosso site, atualize para qualquer um destes navegadores mais atualizados.