Skip to Content

What Is an ODS?

To extract and process data from multiple sources, an operational data store (ODS) acts as a temporary storage location for data processing prior to sending it to its final storage destination. Data can be stored as structured or unstructured, but it must be stored in a way that can be extracted and transformed into a format for its final data warehouse location. ODS architecture is usually built for ETL (extract, transform, and load) and ELT (extract, load, and transform) data pipelines.

What Is an ODS?

An operational data store is a centralized repository for real-time or near real-time data used for operational reporting and analysis. In large data pipelines, an ODS acts as a staging area for data formatting, deduplication, and final processing before data is sent to the data warehouse. For example, a large real estate organization might extract data from several different websites to perform analytics for their customers. During the extraction process, the data pipeline stores the extracted information in an ODS so that automated scripts can format, organize, and deduplicate the data. Once ETL processes data, it’s sent to the data warehouse where real estate applications can query it.

An ODS is used for structured and unstructured data, but it’s especially useful for data pipelines working with relational databases. The ODS might store unstructured data from files or scraped web pages, and the ETL uses it to process collected data prior to the transformation step. Without the ODS, data would be lost if formatting any records failed. Any records that fail transformation can remain in the ODS for additional processing or possibly further human review.

Purpose of an ODS

For large enterprises and machine learning applications, data is often pulled from multiple locations during ETL processing. The data pipeline might pull files from a network source, data from API endpoints, and data scraped from a web application. Scripts used to collect the data dump it into an ODS where it can be processed. The purpose of an ODS is to allow data extraction scripts to have a place to store collected information before processing.

An ODS is an important part of real-time dashboards and applications, especially when the data collected in an ODS is used in several locations. For example, the ODS contains collected data where an ETL process formats it before sending it to a data warehouse where analytics can use it for financial projections. Think of an ODS as an interim data collection service prior to data being available to end-user applications.

Benefits of ODS

Enterprise businesses need an ODS for better data processing and more efficient ETL pipelines. Because ETL scripts have a place to store data, real-time applications also have a location to pull data for quick processing, artificial intelligence calculations, and machine learning ingestion. Without an ODS, your ETL data pipelines could drop data that does not fit database constraints or cannot be processed before being stored in the data warehouse.

A few additional benefits include:

  • Convenient collection of various data sources with disparate formatting and organization
  • A full snapshot of all records collected from various sources that can be used to identify issues or reprocess data if necessary
  • Unstructured data storage capabilities for analytics and machine learning
  • Cloud ODS systems can be configured to be available to users, applications, administrators, or third-party vendors regardless of their location
  • Centralized location to collect data for all internal applications, which increases data accuracy and integrity across all your critical reporting

Implementing an ODS

Because an ODS is a part of your data pipeline and ETL processing, it should be included in your designs and data architecture. The type of data collected is a big determining factor for an ODS. Any unstructured data needs a NoSQL database. A relational database will reject data that does not conform to table constraints.

After you choose the database platform, you’ll need to decide if you want to host the ODS on premises or in the cloud. An on-premises database might be better suited for internal applications unavailable to the public, but your ETL scripts must be able to reach the database and any internal data warehouses. Cloud databases are beneficial for public cloud applications where they can be configured to connect to production cloud application databases.

Real-time applications require speed and compute power, so ensure that your database architecture has the bandwidth, compute power, memory, and storage capacity to handle large loads of data. It might make sense to do a trial run on data collection to identify the amount of storage capacity necessary, but don’t forget to allow additional storage for scalability. Snapshots might eventually be moved to another backup database or removed after the data ages and isn’t relevant anymore.

ODS vs. Data Warehouse

A data warehouse is the final destination for sanitized and formatted data. The ODS in your ETL procedures is where raw data is stored until it’s structured, deduplicated, and verified. The way you organize data and where it’s stored depends on your individual business rules. Relational databases in your data warehouse require structured data with strict rules with the way you must format it before storing it.

ODS tables are consistently updated with new data, and they can be used for real-time data processing and user applications. Structured and unstructured data can be stored in ODS tables, but many systems use unstructured data so that data collection has fewer constraints. Constraints and filtering can be applied during the import process into your data warehouse.

Queries should run from the data warehouse tables where data is much more permanent. It’s unusual to delete data from a data warehouse. You might archive it, but completely removing data is unusual. ODS data is much more volatile. Duplicate data might be removed, and any stale or corrupted data could be deleted. 

Conclusion

If you plan to collect data from various sources for your data warehouse, an ODS interim architecture is beneficial for data pipelines supporting multiple applications with different business rules. You can turn your data into structured and unstructured formats to support machine learning, querying, reporting, analytic dashboards, and any other front-end application that uses the data warehouse.

To allow for a growing database, Pure Storage cloud solutions offer support for AWS, Azure, and any other provider to connect your ODS. Your ETL procedures have fast access to scalable database services to support real-time processing and fast queries.

こちらの資料もご覧ください!

08/2024
Telecom Solutions from Pure Storage
The largest telcos rely on Pure Storage® for mission-critical data services and minimal energy footprint, with innovative technology across all clouds.
ソリューションの概要
2 ページ
ご相談・お問い合わせ
ご質問・ご相談

ピュア・ストレージ製品および認定についてのご質問・ご相談を承っております。ご連絡をお待ちしております。

デモのご用命

ライブデモのご用命を承っております。ピュアがいかにしてデータを成果に変えるお手伝いができるかをご説明します。 

ピュア・ストレージ・ジャパン株式会社

〒100-0014 東京都千代田区永田町 2 丁目 10-3 東急キャピトルタワー 12 階

 

一般: info-japan@purestorage.com

メディア: pr-japan@purestorage.com

03-4563-7443(総合案内)

閉じる
このブラウザは現在サポートされていません。

古いブラウザには、セキュリティ・リスクが存在する場合があります。ピュア・ストレージの Web サイトをより快適にご利用いただけるよう、最新のブラウザにアップデートしてください。