Showing posts with label Structured Data. Show all posts
Showing posts with label Structured Data. Show all posts

Tuesday, August 20, 2019

Structured Data, Semi Structured Data, Unstructured Data

Structured data ( Relational Data)

Structured data is data that adheres to a schema, so all of the data has the same fields or properties. Structured data can be stored in a database table with rows and columns. Structured data relies on keys to indicate how one row in a table relates to data in another row of another table. Structured data is also referred to as relational data, as the data's schema defines the table of data, the fields in the table, and the clear relationship between the two.
Structured data is straightforward in that it's easy to enter, query, and analyze. All of the data follows the same format. However, forcing a consistent structure also means evolution of the data is more difficult as each record has to be updated to conform to the new structure.
Examples of structured data include:
  • Sensor data
  • Financial data
  • Business data
Example of Database:
Business data will most likely be queried by business analysts, who are more likely to know SQL than any other query language. Azure SQL Database could be used as the solution by itself, but pairing it with Azure Analysis Services enables data analysts to create a semantic model over the data in SQL Database. The data analysts can then share it with business users, so that they only need to connect to the model from any business intelligence (BI) tool to immediately explore the data and gain insights.

Semi-structured data (NoSQL/Non-relational data)

Semi-structured data is less organized than structured data, and is not stored in a relational format, as the fields do not neatly fit into tables, rows, and columns. Semi-structured data contains tags that make the organization and hierarchy of the data apparent. Schema can be easily extended. 
Examples of semi-structured data include:
  • Key / Value pairs ( makes the querying faster, use the simple commands Get, Set, Delete)
  • Graph data 
  • JSON files ( used with document database)
  • XML files
Example of Database:

Azure Cosmos DB
It supports SQL for queries and every property is indexed by default. You can create queries so that your customers can filter on any property in the catalog. 
Azure Cosmos DB is also ACID-compliant, so you can be assured that your transactions are completed according to those strict requirements.
As an added plus, Azure Cosmos DB also enables you to replicate your data anywhere in the world with the click of a button. So, if your e-commerce site has users concentrated in the US, France, and England, you can replicate your data to those data centers to reduce latency, as you've physically moved the data closer to your users.
Even with data replicated around the world, you can choose from one of five consistency levels. By choosing the right consistency level, you can determine the tradeoffs to make between consistency, availability, latency, and throughput. You can scale up to handle higher customer demand during peak shopping times, or scale down during slower times to conserve cost.

Azure Table storage, Azure HBase as a part of HDInsight, and Azure Cache for Rediscan also store NoSQL data. In the scenario, where users will want to query on multiple fields,  Azure Cosmos DB is a better fit. Azure Cosmos DB indexes every field by default, whereas the other services are limited in the data they index, and querying on non-indexed fields results in reduced performance.
Unstructured data
The organization of unstructured data is generally ambiguous. Unstructured data is often delivered in files, such as photos or videos. The video file itself may have an overall structure and come with semi-structured metadata, but the data that comprises the video itself is unstructured. Therefore, photos, videos, and other similar files are classified as unstructured data.
Examples of unstructured data include:
  • Media files, such as photos, videos, and audio files
  • Office files, such as Word documents
  • Text files
  • Log files
Example of Database :

Azure Blob storage 
It supports storing files such as photos and videos. It also works with Azure Content Delivery Network (CDN) by caching the most frequently used content and storing it on edge servers. Azure CDN reduces latency in serving up those images to your users.
By using Azure Blob storage, you can also move images from the hot storage tier to the cool or archive storage tier, to reduce costs and focus throughput on the most frequently viewed images and videos.


When deciding what storage solution to use, think about how your data will be used. How often will your data be accessed? Is your data read-only? Does query time matter?

What is a transaction?

A transaction is a logical group of database operations that execute together.
Here's the question to ask yourself regarding whether you need to use transactions in your application: Will a change to one piece of data in your dataset impact another? If the answer is yes, then you'll need support for transactions in your database service.
Transactions are often defined by a set of four requirements, referred to as ACID guarantees. ACID stands for Atomicity, Consistency, Isolation, and Durability:
  • Atomicity means that either all the operations in the transaction are executed, or none of them are.
  • Consistency ensures that if something happens partway through the transaction, a portion of the data isn't left out of the updates. Across the board, the transaction is applied consistently or not at all.
  • Isolation ensures that one transaction is not impacted by another transaction.
  • Durability means that the changes made due to the transaction are permanently saved in the system. Committed data is saved by the system so that even in the event of a failure and system restart, the data is available in its correct state.
When a database offers ACID guarantees, these principles are applied to any transactions in a consistent manner.

OLTP vs OLAP

Transactional databases are often called OLTP (Online Transaction Processing) systems. OLTP systems commonly support lots of users, have quick response times, and handle large volumes of data. They are also highly available (meaning they have very minimal downtime), and typically handle small or relatively simple transactions.
On the contrary, OLAP (Online Analytical Processing) systems commonly support fewer users, have longer response times, can be less available, and typically handle large and complex transactions.
The terms OLTP and OLAP aren't used as frequently as they used to be, but understanding them makes it easier to categorize the needs of your application.