What Is Structured Data?

Structured data has a well-defined schema for the information it holds. To give an extremely simple definition, any data that can be presented in a spreadsheet program like Google Sheets or Microsoft Excel is structured data. In this example, data can be represented as rows and columns. Each column represents a different attribute, while each row will have the data associated with the attribute for a single instance. Rows and columns form a table that can be referenced easily. Different tables can be connected—that is, they can be said to be related by the common column present in both tables. If multiple tables are related in succession and combination, this creates a relational database. For instance, the customer, sales, and inventory data of a department store can be considered structured data stored as a relational database. Each customer will have a customer ID, as well as fields for their name, contact number, credit card information, address, etc. The database of customers can be connected to the database of sales, with attributes including the time of purchase, item codes purchased, total amount spent, customer ID, etc. Both the tables will be connected with the common attribute of customer ID. Finally, the sales database can be connected to the database of inventory using the common attribute of item code, effectively interconnecting all three tables into a relational database. Structured data like this is generally stored in relational database management systems (RDBMSes). Databases can be written, read, and manipulated using Structured Query Language (SQL), a language that was developed by IBM in the 1970s to support its mainframe databases (though it was initially known as Sequence English Query Language or SEQUEL). It was so named since it reads pretty much like the English language. SQL in its current form was popularized by Relational Software, Inc. (now called Oracle).

What Is Unstructured Data?

Every piece of data that is not structured data can be classified as unstructured data. It’s estimated that by 2025, 80% of the data we encounter will be unstructured data in the form of text, audio, image, or video 1 . In short, unstructured data is modern data. It’s often: Born digital and unpredictable Always being created and on the move Blended, multimodal, and interoperable Geo-distributed for better protection Unstructured data can have some associated metadata that can, in turn, have a structure. For example, a video can have metadata of video resolution, bit rate, frames per second (FPS), owner of the video, etc. But the video itself is unstructured. When there’s some structured metadata associated with unstructured data, it’s occasionally referred to as semi-structured data. Looking more closely at the example of a YouTube video, some metadata is present, such as the time of upload, date of upload, number of views (partial or full), number of likes and dislikes, etc. But the content inside the video title, the video description, and the video itself is unstructured. It has a qualitative aspect that cannot be captured purely by numbers. The most commonly used database for unstructured data is NoSQL. NoSQL stands for “not only SQL,” indicating that the database can handle a wider range of data beyond the capabilities of SQL databases. There’s no schema or tabular structure for NoSQL databases; it’s just a collection of data grouped together.

ピュア・ナレッジ
ビッグデータの基礎
構造化データと非構造化データ

ビッグデータの基礎

構造化データと非構造化データ

※このページの内容が日本語である場合は、機械翻訳システムで翻訳したものです。

データの定義およびデータに対する解釈が、この 10 年間で大きく変わりました。非構造化データの読み取り、保存、分析を行うための新たなツールが登場したことが 1 つの要因となっています。

従来、非構造化データは、解釈が困難なことが理由で、十分に活用されていませんでした。新たなテクノロジーによって、非構造化データを理解することが容易になり、さらに、非構造化データという情報の宝庫から貴重な知見を引き出せるようになっています。

IDC 社によると、2024 年までに世界中で作成、取得、コピー、消費されるデータの総量は、毎年 149 ゼタバイトを超え、その多くは非構造化データであると予測されています。非構造化データの分析機能を構築することで、あらゆる組織がメリットを得ることができます。そのためにはまず、構造化データと非構造化データの違いを理解する必要があります。

以下に、両者の違いを簡単にまとめ、より詳細な説明を続けます。

特徴	構造化データ	非構造化データ
データの性質	通常は定量的	通常は定性的
データ・モデル	事前定義。いったん定義され、データが保存されると、モデルの変更は困難。	特定のスキーマは存在せず、データ・モデルは非常に柔軟。
データ形式	使用できるデータ形式は限られている	膨大な種類のデータ形式を使用可能
データベース	SQL ベースのリレーショナル・データベースを使用	特定のスキーマを持たない NoSQL データベースを使用
検索	データベースやデータ・セット内のデータを非常に簡単に検索・発見できる	構造化されていないため、特定のデータを検索することは非常に困難
分析	定量的なデータであるため、分析が容易	ソフトウェア・ツールを利用しても、分析は極めて困難
保存場所	データ・ウェアハウス	データ・レイク

Slide

構造化データとは

構造化データは、保持する情報について明確に定義されたスキーマがあります。非常に単純に定義すると、Google スプレッドシートやMicrosoft Excel などの表計算プログラムで表せるデータは全て構造化データです。

この場合、データは行と列で表現されます。各列は異なる属性を表し、各行は単一のインスタンスの属性に関連付けられたデータを持ちます。行と列によって、簡単に参照できる表が形成されます。

異なる表を連結することもでき、そのことはつまり、両方の表に存在する共通の列によって関連付けられていることになります。

複数の表を連続して組み合わせて関連付けることで、リレーショナル・データベースができあがります。例えば、デパートの顧客データ、売上データ、在庫データなどは、リレーショナル・データベースとして保存されている構造化データです。

各顧客には顧客 ID のほか、氏名、連絡先、クレジット・カード情報、住所などのフィールドがあります。
顧客データベースは、売上データベースと接続することができ、購入時刻、購入品のアイテム・コード、購入金額、顧客 ID などの属性を持つことができます。これらの表は、顧客 ID という共通の属性で関連付けられています。
さらに、アイテム・コードという共通の属性を使用して売上データベースを在庫データベースに接続することで、リレーショナル・データベースに 3 つの表を効果的に相互接続することができます。

このような構造化されたデータは、一般的にリレーショナル・データベース管理システム（RDBMS）に格納されます。データベースは、SQL（Structured Query Language）を使って記述、読み取り、操作することができます。SQL は、1970 年代に IBM 社がメインフレームのデータベースをサポートするために開発した言語で、当初は、SEQUEL（Sequence English Query Language）と呼ばれていました。英語によく似た読み方をすることがこの名前の由来です。現在の形の SQL は、Relational Software, Inc.社（現 Oracle 社）によって広められました。

非構造化データとは

非構造化データとは、構造化されていないデータを意味します。構造化されていない全てのデータが非構造化データに分類されます。2025 年には、私たちが扱うデータの 80% が、テキスト、音声、画像、動画などによる非構造化データになると予測されています。¹

すなわち、非構造化データはモダン・データといえます。非構造化データには次のような特徴があります。

本質的にデジタルで、予測不可能
常時生成され、動的に変化する
ブレンド、マルチモーダル、相互運用が可能
地理的な分散により保護される

非構造化データには、構造を持つメタデータが関連付けられている場合があります。例えば、動画には、解像度、ビットレート、1 秒あたりのフレーム数（FPS）、所有者などのメタデータを関連付けることができます。しかし、動画自体は構造化されていません。構造化されたメタデータが関連付けられている非構造化データを、半構造化データと呼ぶことがあります。

YouTube の動画を例に挙げると、アップロードした日時、視聴回数（部分・全体）、評価の数といったメタデータが存在します。しかし、動画自体の内容、タイトルや説明文は構造化されていません。それらは、単純に数字だけでは捉えられないという特徴があります。

非構造化データ用のデータベースとして最もよく使われているのが NoSQL です。NoSQL は「not only SQL」の略で、SQL データベースのケイパビリティを超えて、より広範囲のデータを扱えることを示しています。NoSQL データベースには、スキーマや表形式の構造はなく、データをグループ化するだけです。

UFFO を利用した非構造化データの保存

非構造化データを活用することで、大きな変革の可能性を秘めた重要な知見を提供できるかもしれませんが、それにはさまざまな課題が存在します。ピュア・ストレージの先進的な UFFO ストレージ・ソリューションである FlashBlade® は、フラッシュ・ストレージ技術による優れたスピードを提供するだけでなく、あらゆるアーキテクチャを俊敏に拡張する能力を備えています。ご興味をお持ちのお客様には、FlashBlade を無料でお試しいただけるテスト・ドライブをご用意しています。