DAY 12 -AZURE DP900(Microsoft Azure Data Fundamentals: Explore non-relational data in Azure)
Microsoft Azure Data Fundamentals: Explore non-relational data in Azure
Explore non-relational data offerings in Azure
Explore provisioning and deploying non-relational data services in Azure
Manage non-relational data stores in Azure
Explore non-relational data offerings in Azure----------
Introduction
Data comes in all shapes and sizes, and can be used for a large number of purposes. Many organizations use relational databases to store this data. However, the relational model might not be the most appropriate schema. The structure of the data might be too varied to easily model as a set of relational tables. For example, the data might contain items such as video, audio, images, temporal information, large volumes of free text, encrypted information, or other types of data that aren't inherently relational. Additionally, the data processing requirements might not be best suited by attempting to convert this data into the relational format. In these situations, it may be better to use non-relational repositories that can store data in its original format, but that allow fast storage and retrieval access to this data.
Suppose you're a data engineer working at Contoso, an organization with a large manufacturing operation. The organization has to gather and store information from a range of sources, such as real-time data monitoring the status of production line machinery, product quality control data, historical production logs, product volumes in stock, and raw materials inventory data. This information is critical to the operation of the organization. You've been asked to determine how best to store this information, so that it can be stored quickly, and queried easily.
Learning objectives
In this module, you will:
- Explore use-cases and management benefits of using Azure Table storage
- Explore use-cases and management benefits of using Azure Blob storage
- Explore use-cases and management benefits of using Azure File storage
- Explore use-cases and management benefits of using Azure Cosmos DB
Explore Azure Table storage
Azure Table Storage implements the NoSQL key-value model. In this model, the data for an item is stored as a set of fields, and the item is identified by a unique key.
What is Azure Table Storage?
Azure Table Storage is a scalable key-value store held in the cloud. You create a table using an Azure storage account.
In an Azure Table Storage table, items are referred to as rows, and fields are known as columns. However, don't let this terminology confuse you by thinking that an Azure Table Storage table is like a table in a relational database. An Azure table enables you to store semi-structured data. All rows in a table must have a key, but apart from that the columns in each row can vary. Unlike traditional relational databases, Azure Table Storage tables have no concept of relationships, stored procedures, secondary indexes, or foreign keys. Data will usually be denormalized, with each row holding the entire data for a logical entity. For example, a table holding customer information might store the first name, last name, one or more telephone numbers, and one or more addresses for each customer. The number of fields in each row can be different, depending on the number of telephone numbers and addresses for each customer, and the details recorded for each address. In a relational database, this information would be split across multiple rows in several tables. In this example, using Azure Table Storage provides much faster access to the details of a customer because the data is available in a single row, without requiring that you perform joins across relationships.
To help ensure fast access, Azure Table Storage splits a table into partitions. Partitioning is a mechanism for grouping related rows, based on a common property or partition key. Rows that share the same partition key will be stored together. Partitioning not only helps to organize data, it can also improve scalability and performance:
Partitions are independent from each other, and can grow or shrink as rows are added to, or removed from, a partition. A table can contain any number of partitions.
When you search for data, you can include the partition key in the search criteria. This helps to narrow down the volume of data to be examined, and improves performance by reducing the amount of I/O (reads and writes) needed to locate the data.
The key in an Azure Table Storage table comprises two elements; the partition key that identifies the partition containing the row (as described above), and a row key that is unique to each row in the same partition. Items in the same partition are stored in row key order. If an application adds a new row to a table, Azure ensures that the row is placed in the correct position in the table. In the example below, taken from an IoT scenario, the row key is a date and time value.
This scheme enables an application to quickly perform Point queries that identify a single row, and Range queries that fetch a contiguous block of rows in a partition.
In a point query, when an application retrieves a single row, the partition key enables Azure to quickly hone in on the correct partition, and the row key lets Azure identify the row in that partition. You might have hundreds of millions of rows, but if you've defined the partition and row keys carefully when you designed your application, data retrieval can be very quick. The partition key and row key effectively define a clustered index over the data.
In a range query, the application searches for a set of rows in a partition, specifying the start and end point of the set as row keys. This type of query is also very quick, as long as you have designed your row keys according to the requirements of the queries performed by your application.
The columns in a table can hold numeric, string, or binary data up to 64 KB in size. A table can have up to 252 columns, apart from the partition and row keys. The maximum row size is 1 MB. For more information, read Understanding the Table service data model.
Use cases and management benefits of using Azure Table Storage
Azure Table Storage tables are schemaless. It's easy to adapt your data as the needs of your application evolve. You can use tables to hold flexible datasets such as user data for web applications, address books, device information, or other types of metadata your service requires. The important part is to choose the partition and row keys carefully.
The primary advantages of using Azure Table Storage tables over other ways of storing data include:
- It's simpler to scale. It takes the same time to insert data in an empty table, or a table with billions of entries. An Azure storage account can hold up to 5 PB of data.
- A table can hold semi-structured data
- There's no need to map and maintain the complex relationships typically required by a normalized relational database.
- Row insertion is fast
- Data retrieval is fast, if you specify the partition and row keys as query criteria
There are disadvantages to storing data this way though, including:
- Consistency needs to be given consideration as transactional updates across multiple entities aren't guaranteed
- There's no referential integrity; any relationships between rows need to be maintained externally to the table
- It's difficult to filter and sort on non-key data. Queries that search based on non-key fields could result in full table scans
Azure Table Storage is an excellent mechanism for:
- Storing TBs of structured data capable of serving web scale applications. Examples include product catalogs for eCommerce applications, and customer information, where the data can be quickly identified and ordered by a composite key. In the case of a product catalog, the partition key could be the product category (such as footwear), and the row key identifies the specific product in that category (such as climbing boots).
- Storing datasets that don't require complex joins, foreign keys, or stored procedures, and that can be denormalized for fast access. In an IoT system, you might use Azure Table Storage to capture device sensor data. Each device could have its own partition, and the data could be ordered by the date and time each measurement was captured.
- Capturing event logging and performance monitoring data. Event log and performance information typically contain data that is structured according to the type of event or performance measure being recorded. The data could be partitioned by event or performance measurement type, and ordered by the date and time it was recorded. Alternatively, you could partition data by date, if you need to analyze an ordered series of events and performance measures chronologically. If you want to analyze data by type and date/time, then consider storing the data twice, partitioned by type, and again by date. Writing data is fast, and the data is static once it has been recorded.
Azure Table Storage is intended to support very large volumes of data, up to several hundred TBs in size. As you add rows to a table, Azure Table Storage automatically manages the partitions in a table and allocates storage as necessary. You don't need to take any additional steps yourself.
Azure Table Storage provides high-availability guarantees in a single region. The data for each table is replicated three times within an Azure region. For increased availability, but at additional cost, you can create tables in geo-redundant storage. In this case, the data for each table is replicated a further three times in another region several hundred miles away. If a replica in the local region becomes unavailable, Azure will transparently switch to a working replica while the failed replica is recovered. If an entire region is hit by an outage, your tables are safe in a remote region, and you can quickly switch your application to connect to that remote region.
Azure Table Storage helps to protect your data. You can configure security and role-based access control to ensure that only the people or applications that need to see your data can actually retrieve it.
Create and view a table using the Azure portal
The simplest way to create a table in Azure Table Storage is to use the Azure portal. Follow these steps:
Sign into the Azure portal using your Azure account.
On the home page of the Azure portal, select +Create a resource.
On the New page, select Storage account - blob, file, table, queue
On the Create storage account page, enter the following details, and then select Review + create.
On the validation page, click Create, and wait while the new storage account is configured.
When the Your deployment is complete page appears, select Go to resource.
On the Overview page for the new storage account, select Tables.
On the Tables page, select + Table.
In the Add table dialog box, enter testtable for the name of the table, and then select OK.
When the new table has been created, select Storage Explorer.
On the Storage Explorer page, expand Tables, and then select testtable. Select Add to insert a new entity into the table.
Note
In Storage Explorer, rows are also called entities.
In the Add Entity dialog box, enter your own values for the PartitionKey and RowKey properties, and then select Add Property. Add a String property called Name and set the value to your name. Select Add Property again, and add a Double property (this is numeric) named Age, and set the value to your age. Select Insert to save the entity.
Verify that the new entity has been created. The entity should contain the values you specified, together with a timestamp that contains the date and time that the entity was created.
If time allows, experiment with creating additional entities. Not all entities must have the same properties. You can use the Edit function to modify the values in entity, and add or remove properties. The Query function enables you to find entities that have properties with a specified set of values.
Explore Azure Blob storage
Many applications need to store large, binary data objects, such as images and video streams. Microsoft Azure virtual machines use blob storage for holding virtual machine disk images. These objects can be several hundreds of GB in size.
Note
The term blob is an acronym for Binary Large OBject.
What is Azure Blob storage?
Azure Blob storage is a service that enables you to store massive amounts of unstructured data, or blobs, in the cloud. Like Azure Table storage, you create blobs using an Azure storage account.
Azure currently supports three different types of blob:
Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 100 MB. A block blob can contain up to 50,000 blocks, giving a maximum size of over 4.7 TB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently.
Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob is optimized to support random read and write operations; you can fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines.
Append blobs. An append blob is a block blob optimized to support append operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just over 195 GB.
Inside an Azure storage account, you create blobs inside containers. A container provides a convenient way of grouping related blobs together, and you can organize blobs in a hierarchy of folders, similar to files in a file system on disk. You control who can read and write blobs inside a container at the container level.
Blob storage provides three access tiers, which help to balance access latency and storage cost:
The Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data is stored on high-performance media.
The Cool tier. This tier has lower performance and incurs reduced storage charges compared to the Hot tier. Use the Cool tier for data that is accessed infrequently. It's common for newly created blobs to be accessed frequently initially, but less so as time passes. In these situations, you can create the blob in the Hot tier, but migrate it to the Cool tier later. You can migrate a blob from the Cool tier back to the Hot tier.
The Archive tier. This tier provides the lowest storage cost, but with increased latency. The Archive tier is intended for historical data that mustn't be lost, but is required only rarely. Blobs in the Archive tier are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is a few milliseconds, but for the Archive tier, it can take hours for the data to become available. To retrieve a blob from the Archive tier, you must change the access tier to Hot or Cool. The blob will then be rehydrated. You can read the blob only when the rehydration process is complete.
You can create lifecycle management policies for blobs in a storage account. A lifecycle management policy can automatically move a blob from Hot to Cool, and then to the Archive tier, as it ages and is used less frequently (policy is based on the number of days since modification). A lifecycle management policy can also arrange to delete outdated blobs.
Use cases and management benefits of using Azure Blob Storage
Common uses of Azure Blob Storage include:
- Serving images or documents directly to a browser, in the form of a static website. Visit Static website hosting in Azure storage for detailed information.
- Storing files for distributed access
- Streaming video and audio
- Storing data for backup and restore, disaster recovery, and archiving
- Storing data for analysis by an on-premises or Azure-hosted service
Note
Azure Blob storage is also used as the basis for Azure Data Lake storage. You can use Azure Data Lake storage for performing big data analytics. For more information, visit Introduction to Azure Data Lake Storage Gen2.
To ensure availability, Azure Blob storage provides redundancy. Blobs are always replicated three times in the region in which you created your account, but you can also select geo-redundancy, which replicates your data in a second region (at additional cost).
Other features available with Azure Blob storage include:
Versioning. You can maintain and restore earlier versions of a blob.
Soft delete. This feature enables you to recover a blob that has been removed or overwritten, by accident or otherwise.
Snapshots. A snapshot is a read-only version of a blob at a particular point in time.
Change Feed. The change feed for a blob provides an ordered, read-only, record of the updates made to a blob. You can use the change feed to monitor these changes, and perform operations such as:
- Update a secondary index, synchronize with a cache, search-engine, or any other content-management scenarios.
- Extract business analytics insights and metrics, based on changes that occur to your objects, either in a streaming manner or batched mode.
- Store, audit, and analyze changes to your objects, over any period of time, for security, compliance or intelligence for enterprise data management.
- Build solutions to back up, mirror, or replicate object state in your account for disaster management or compliance.
- Build connected application pipelines that react to change events or schedule executions based on created or changed objects.
Create and view a block blob using the Azure portal
You can create block blobs using the Azure portal. Remember that blobs are stored in containers, and you create a container using a storage account. The following steps assume you've created the storage account described in the previous unit.
In the Azure portal, on the left-hand navigation menu, select Home.
On the home page, select Storage accounts.
On the Storage accounts page, select the storage account you created in the previous unit.
On the Overview page for your storage account, select Storage Explorer.
On the Storage Explorer page, right-click BLOB CONTAINERS, and then select Create blob container.
In the New Container dialog box, give your container a name, accept the default public access level, and then select Create.
In the Storage Explorer window, expand BLOB CONTAINERS, and then select your new blob container.
In the blobs window, select Upload.
In the Upload blob dialog box, use the files button to pick a file of your choice on your computer, and then select Upload
When the upload has completed, close the Upload blob dialog box. Verify that the block blob appears in your container.
If you have time, you can experiment uploading other files as block blobs. You can also download blobs back to your computer using the Download button.
Explore Azure File storage
Many on-premises systems comprising a network of in-house computers make use of file shares. A file share enables you to store a file on one computer, and grant access to that file to users and applications running on other computers. This strategy can work well for computers in the same local area network, but doesn't scale well as the number of users increases, or if users are located at different sites.
What is Azure File Storage?
Azure File Storage enables you to create files shares in the cloud, and access these file shares from anywhere with an internet connection. Azure File Storage exposes file shares using the Server Message Block 3.0 (SMB) protocol. This is the same file sharing protocol used by many existing on-premises applications. These applications should continue to work unchanged if you migrate your file shares to the cloud. The applications can be running on-premises, or in the cloud. You can control access to shares in Azure File Storage using authentication and authorization services available through Azure Active Directory Domain Services.
You create Azure File storage in a storage account. Azure File Storage enables you to share up to 100 TB of data in a single storage account. This data can be distributed across any number of file shares in the account. The maximum size of a single file is 1 TB, but you can set quotas to limit the size of each share below this figure. Currently, Azure File Storage supports up to 2000 concurrent connections per shared file.
Once you've created a storage account, you can upload files to Azure File Storage using the Azure portal, or tools such as the AzCopy utility. You can also use the Azure File Sync service to synchronize locally cached copies of shared files with the data in Azure File Storage.
Azure File Storage offers two performance tiers. The Standard tier uses hard disk-based hardware in a datacenter, and the Premium tier uses solid-state disks. The Premium tier offers greater throughput, but is charged at a higher rate.
Use cases and management benefits of using Azure File Storage
Azure File Storage is designed to support many scenarios, including the following:
Migrate existing applications to the cloud.
Many existing applications access data using file-based APIs, and are designed to share data using SMB file shares. Azure File Storage enables you to migrate your on-premises file or file share-based applications to Azure without having to provision or manage highly available file server virtual machines.
Share server data across on-premises and cloud.
Customers can now store server data such as log files, event data, and backups in the cloud to leverage the availability, durability, scalability, and geo redundancy built into the Azure storage platform. With encryption in SMB 3.0, you can securely mount Azure File Storage shares from anywhere. Applications running in the cloud can share data with on-premises applications using the same consistency guarantees implemented by on-premises SMB servers.
Integrate modern applications with Azure File Storage.
By leveraging the modern REST API that Azure File Storage implements in addition to SMB 3.0, you can integrate legacy applications with modern cloud applications, or develop new file or file share-based applications.
Simplify hosting High Availability (HA) workload data.
Azure File Storage delivers continuous availability so it simplifies the effort to host HA workload data in the cloud. The persistent handles enabled in SMB 3.0 increase availability of the file share, which makes it possible to host applications such as SQL Server and IIS in Azure with data stored in shared file storage.
Note
Don't use Azure File Storage for files that can be written by multiple concurrent processes simultaneously. Multiple writers require careful synchronization, otherwise the changes made by one process can be overwritten by another. The alternative solution is to lock the file as it is written, and then release the lock when the write operation is complete. However, this approach can severely impact concurrency and limit performance.
Azure Files Storage is a fully managed service. Your shared data is replicated locally within a region, but can also be geo-replicated to a second region.
Azure aims to provide up to 300 MB/second of throughput for a single Standard file share, but you can increase throughput capacity by creating a Premium file share, for additional cost.
All data is encrypted at rest, and you can enable encryption for data in-transit between Azure File Storage and your applications.
For additional information on managing and planning to use Azure File Storage, read Planning for an Azure Files deployment.
Create an Azure storage file share using the Azure portal
You can create Azure storage file shares using the Azure portal. The following steps assume you've created the storage account described in unit 2.
In the Azure portal, on the hamburger menu, select Home.
On the home page, select Storage accounts.
On the Storage accounts page, select the storage account you created in the unit 2.
On the Overview page for your storage account, select Storage Explorer.
On the Storage Explorer page, right-click FILE SHARES, and then select Create file share.
In the New file share dialog box, enter a name for your file share, leave Quota empty, and then select Create.
In the Storage Explorer window, expand FILE SHARES, and select your new file share, and then select Upload.
Tip
If your new file share doesn't appear, right-click FILE SHARES, and then select Refresh.
In the Upload files dialog box, use the files button to pick a file of your choice on your computer, and then select Upload
When the upload has completed, close the Upload files dialog box. Verify that the file appears in file share.
Tip
If the file doesn't appear, right-click FILE SHARES, and then select Refresh.
Explore Azure Cosmos DB
Tables, blobs, and files are all specialized types of storage, aimed at helping to solve specific problems. Reading and writing a table is a significantly different task from storing data in a blob, or processing a file. Sometimes you require a more generalized solution, that enables you to store and query data more easily, without having to worry about the exact mechanism for performing these operations. This is where a database management system proves useful.
Relational databases store data in relational tables, but sometimes the structure imposed by this model can be too rigid, and often leads to poor performance unless you spend time implementing detailed tuning. Other models, collectively known as NoSQL databases exist. These models store data in other structures, such as documents, graphs, key-value stores, and column family stores.
What is Azure Cosmos DB?
Azure Cosmos DB is a multi-model NoSQL database management system. Cosmos DB manages data as a partitioned set of documents. A document is a collection of fields, identified by a key. The fields in each document can vary, and a field can contain child documents. Many document databases use JSON (JavaScript Object Notation) to represent the document structure. In this format, the fields in a document are enclosed between braces, { and }, and each field is prefixed with its name. The example below shows a pair of documents representing customer information. In both cases, each customer document includes child documents containing the name and address, but the fields in these child documents vary between customers.
## Document 1 ##
{
"customerID": "103248",
"name":
{
"first": "AAA",
"last": "BBB"
},
"address":
{
"street": "Main Street",
"number": "101",
"city": "Acity",
"state": "NY"
},
"ccOnFile": "yes",
"firstOrder": "02/28/2003"
}
## Document 2 ##
{
"customerID": "103249",
"name":
{
"title": "Mr",
"forename": "AAA",
"lastname": "BBB"
},
"address":
{
"street": "Another Street",
"number": "202",
"city": "Bcity",
"county": "Gloucestershire",
"country-region": "UK"
},
"ccOnFile": "yes"
}
A document can hold up to 2 MB of data, including small binary objects. If you need to store larger blobs as part of a document, use Azure Blob storage, and add a reference to the blob in the document.
Cosmos DB provides APIs that enable you to access these documents using a set of well-known interfaces.
Note
An API is an Application Programming Interface. Database management systems (and other software frameworks) provide a set of APIs that developers can use to write programs that need to access data. The APIs will often be different for different database management systems.
The APIs that Cosmos DB currently supports include:
SQL API. This interface provides a SQL-like query language over documents, enable to identify and retrieve documents using SELECT statements. The example below finds the address for customer 103248 in the documents shown above:
SQLSELECT a.address FROM customers a WHERE a.customerID = "103248"
Table API. This interface enables you to use the Azure Table Storage API to store and retrieve documents. The purpose of this interface is to enable you to switch from Table Storage to Cosmos DB without requiring that you modify your existing applications.
MongoDB API. MongoDB is another well-known document database, with its own programmatic interface. Many organizations run MongoDB on-premises. You can use the MongoDB API for Cosmos DB to enable a MongoDB application to run unchanged against a Cosmos DB database. You can migrate the data in the MongoDB database to Cosmos DB running in the cloud, but continue to run your existing applications to access this data.
Cassandra API. Cassandra is a column family database management system. This is another database management system that many organizations run on-premises. The Cassandra API for Cosmos DB provides a Cassandra-like programmatic interface for Cosmos DB. Cassandra API requests are mapped to Cosmos DB document requests. As with the MongoDB API, the primary purpose of the Cassandra API is to enable you to quickly migrate Cassandra databases and applications to Cosmos DB.
Gremlin API. The Gremlin API implements a graph database interface to Cosmos DB. A graph is a collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos DB, but the Gremlin API enables you to perform graph queries over data. Using the Gremlin API you can walk through the objects and relationships in the graph to discover all manner of complex relationships, such as "What is the name of the pet of Sam's landlord?" in the graph shown below.
Note
The primary purpose of the Table, MongoDB, Cassandra, and Gremlin APIs is to support existing applications. If you are building a new application and database, you should use the SQL API.
Documents in a Cosmos DB database are organized into containers. The documents in a container are grouped together into partitions. A partition holds a set of documents that share a common partition key. You designate one of the fields in your documents as the partition key. You should select a partition key that collects all related documents together. This approach helps to reduce the amount of I/O (disk reads) that queries might need to perform when retrieving a set of documents for a given entity. For example, in a document database for an ecommerce system recording the details of customers and the orders they've placed, you could partition the data by customer ID, and store the customer and order details for each customer in the same partition. To find all the information and orders for a customer, you simply need to query that single partition:
There's a superficial similarity between a Cosmos DB container and a table in Azure Table storage: in both cases, data is partitioned and documents (rows in a table) are identified by a unique ID within a partition. However, the similarity ends there. Unlike Azure Table storage, documents in a Cosmos DB partition aren't sorted by ID. Instead, Cosmos DB maintains a separate index. This index contains not only the document IDs, but also tracks the value of every other field in each document. This index is created and maintained automatically. This index enables you to perform queries that specify criteria referencing any fields in a container, without incurring the need to scan the entire partition to find that data. For a detailed description of how Cosmos DB indexing works, read Indexing in Azure Cosmos DB - Overview.
Use cases and management benefits of using Azure Cosmos DB
Cosmos DB is a highly scalable database management system. Cosmos DB automatically allocates space in a container for your partitions, and each partition can grow up to 10 GB in size. Indexes are created and maintained automatically. There's virtually no administrative overhead.
To ensure availability, all databases are replicated within a single region. This replication is transparent, and failover from a failed replica is automatic. Cosmos DB guarantees 99.99% high availability.
Additionally, you can choose to replicate data across regions, at additional cost. This feature enables you to place copies of data anywhere in the world, and enable applications to connect to the copy of the data that happens to be the closest, reducing query latency. All replicas are synchronized, although there may be a small window while updates are transmitted and applied. The multi-master replication protocol supports five well-defined consistency choices - strong, bounded staleness, session, consistent prefix, and eventual. For more information, see Consistency levels in Azure Cosmos DB.
Cosmos DB guarantees less than 10-ms latencies for both reads (indexed) and writes at the 99th percentile, all around the world. This capability enables sustained ingestion of data and fast queries for highly responsive apps.
Cosmos DB is certified for a wide array of compliance standards. Additionally, all data in Cosmos DB is encrypted at rest and in motion. Cosmos DB provides row level authorization and adheres to strict security standards.
Cosmos DB is a foundational service in Azure. Cosmos DB has been used by many of Microsoft's products for mission critical applications at global scale, including Skype, Xbox, Microsoft 365, Azure, and many others. Cosmos DB is highly suitable for the following scenarios:
IoT and telematics. These systems typically ingest large amounts of data in frequent bursts of activity. Cosmos DB can accept and store this information very quickly. The data can then be used by analytics services, such as Azure Machine Learning, Azure HDInsight, and Power BI. Additionally, you can process the data in real-time using Azure Functions that are triggered as data arrives in the database.
Retail and marketing. Microsoft uses CosmosDB for its own e-commerce platforms that run as part of Windows Store and Xbox Live. It's also used in the retail industry for storing catalog data and for event sourcing in order processing pipelines.
Gaming. The database tier is a crucial component of gaming applications. Modern games perform graphical processing on mobile/console clients, but rely on the cloud to deliver customized and personalized content like in-game stats, social media integration, and high-score leaderboards. Games often require single-millisecond latencies for reads and write to provide an engaging in-game experience. A game database needs to be fast and be able to handle massive spikes in request rates during new game launches and feature updates.
Web and mobile applications. Azure Cosmos DB is commonly used within web and mobile applications, and is well suited for modeling social interactions, integrating with third-party services, and for building rich personalized experiences. The Cosmos DB SDKs can be used to build rich iOS and Android applications using the popular Xamarin framework.
For additional information about uses for Cosmos DB, read Common Azure Cosmos DB use cases.
Explore provisioning and deploying non-relational data services in Azure----------
Introduction
Microsoft Azure supports a number of non-relational data services, including Azure File storage, Azure Blob storage, Azure Data Lake Store, and Azure Cosmos DB. These services support different types of non-relational data. For example, you can use Cosmos DB to store documents, and Blob storage as a repository for large binary objects such as video and audio data.
Before you can use a service, you must provision an instance of that service. You can then configure the service to enable you to store and retrieve data, and to make it accessible to the users and applications that require it.
Suppose you're a data engineer working at Contoso, an organization with a large manufacturing operation. The organization has to gather and store information from a range of sources, such as real-time data monitoring the status of production line machinery, product quality control data, historical production logs, product volumes in stock, and raw materials inventory data. This information is critical to the operation of the organization. Contoso has decided to store this information in various non-relational databases, according to the different data processing requirements for each dataset. You've been asked to provision a range of Azure data services to enable applications to store and process the information.
Learning objectives
In this module, you will:
- Provision non-relational data services
- Configure non-relational data services
- Explore basic connectivity issues
- Explore data security components
Describe provisioning non-relational data services
In the sample scenario, Contoso has decided that the organization will require a number of different non-relational stores. As the data engineer, you're asked to set up data stores using Azure Cosmos DB, Azure Blob storage, Azure Data Lake store, and Azure File storage.
In this unit, you'll learn more about what the provisioning process entails, and what actually happens when you provision a service.
What is provisioning?
Provisioning is the act of running a series of tasks that a service provider, such as Azure Cosmos DB, performs to create and configure a service. Behind the scenes, the service provider will set up the various resources (disks, memory, CPUs, networks, and so on) required to run the service. You'll be assigned these resources, and they remain allocated to you (and charged to you), until you delete the service.
How the service provider provisions resources is opaque, and you don't need to be concerned with how this process works. All you do is specify parameters that determine the size of the resources required (how much disk space, memory, computing power, and network bandwidth). These parameters are determined by estimating the size of the workload that you intend to run using the service. In many cases, you can modify these parameters after the service has been created, perhaps increasing the amount of storage space or memory if the workload is greater than you initially anticipated. The act of increasing (or decreasing) the resources used by a service is called scaling.
This video summarizes the process that Azure performs when you provision a service:
Azure provides several tools you can use to provision services:
The Azure portal. This is the most convenient way to provision a service for most users. The Azure portal displays a series of service-specific pages that prompt you for the settings required, and validates these settings, before actually provisioning the service.
The Azure command-line interface (CLI). The CLI provides a set of commands that you can run from the operating system command prompt or the Cloud Shell in the Azure portal. You can use these commands to create and manage Azure resources. The CLI is suitable if you need to automate service creation; you can store CLI commands in scripts, and you can run these scripts programmatically. The CLI can run on Windows, macOS, and Linux computers. For detailed information about the Azure CLI, read What is Azure CLI.
Azure PowerShell. Many administrators are familiar with using PowerShell commands to script and automate administrative tasks. Azure provides a series of commandlets (Azure-specific commands) that you can use in PowerShell to create and manage Azure resources. You can find further information about Azure PowerShell online, at Azure PowerShell documentation. Like the CLI, PowerShell is available for Windows, macOS, and Linux.
Azure Resource Manager templates. An Azure Resource Manager template describes the service (or services) that you want to deploy in a text file, in a format known as JSON (JavaScript Object Notation). The example below shows a template that you can use to provision an Azure Storage account.
JSON"resources": [ { "type": "Microsoft.Storage/storageAccounts", "apiVersion": "2016-01-01", "name": "mystorageaccount", "location": "westus", "sku": { "name": "Standard_LRS" }, "kind": "Storage", "properties": {} } ]
You send the template to Azure using the
az deployment group create
command in the Azure CLI, orNew-AzResourceGroupDeployment
command in Azure PowerShell. For more information about creating and using Azure Resource Manager templates to provision Azure resources, see What are Azure Resource Manager templates?
Provision Azure Cosmos DB
Azure Cosmos DB is a document database, suitable for a range of applications. In the sample scenario, Contoso decided to use Cosmos DB for at least part of their data storage and processing.
In Cosmos DB, you organize your data as a collection of documents stored in containers. Containers are held in a database. A database runs in the context of a Cosmos DB account. You must create the account before you can set up any databases.
This unit describes how to provision a Cosmos DB account, and then create a database and a container in this account.
How to provision a Cosmos DB account
You can provision a Cosmos DB account interactively using the Azure portal, or you can perform this task programmatically through the Azure CLI, Azure PowerShell, or an Azure Resource Manager template. The following video describes how to use the Azure portal.
If you prefer to use the Azure CLI or Azure PowerShell, you can run the following commands to create a Cosmos DB account. The parameters to these commands correspond to many of the options you can select using the Azure portal. The examples shown below create an account for the Core(SQL) API, with geo-redundancy between the EastUS and WestUS regions, and support for multi-region writes. For more information about these commands, see the az cosmosdb create page for the Azure CLI, or the [New-AzCosmosDBAccount page for PowerShell.
## Azure CLI
az cosmosdb create \
--subscription <your-subscription> \
--resource-group <resource-group-name> \
--name <cosmosdb-account-name> \
--locations regionName=eastus failoverPriority=0 \
--locations regionName=westus failoverPriority=1 \
--enable-multiple-write-locations
## Azure PowerShell
New-AzCosmosDBAccount `
-ResourceGroupName "<resource-group-name>" `
-Name "<cosmosbd-account-name>" `
-Location @("West US", "East US") `
-EnableMultipleWriteLocations
Note
To use Azure PowerShell to provision a Cosmos DB account, you must first install the Az.CosmosDB PowerShell module:
Install-Module -Name Az.CosmosDB
The other deployment option is to use an Azure Resource Manager template. The template for Cosmos DB can be rather lengthy, because of the number of parameters. To make life easier, Microsoft has published a number of example templates for handling different configurations. You can download these templates from the Microsoft web site, at Manage Azure Cosmos DB Core (SQL) API resources with Azure Resource Manager templates.
How to create a database and a container
An Azure Cosmos DB account by itself doesn't really provide any resources other than a few pieces of static infrastructure. Databases and containers are the primary resource consumers. Resources are allocated in terms of the storage space required to hold your databases and containers, and the processing power required to store and retrieve data. Azure Cosmos DB uses the concept of Request Units per second (RU/s) to manage the performance and cost of databases. This measure abstracts the underlying physical resources that need to be provisioned to support the required performance.
You can think of a request unit as the amount of computation and I/O resources required to satisfy a simple read request made to the database. Microsoft gives a measure of approximately one RU as the resources required to read a 1-KB document with 10 fields. So a throughput of one RU per second (RU/s) will support an application that reads a single 1-KB document each second. You can specify how many RU/s of throughput you require when you create a database or when you create individual containers in a database. If you specify throughput for a database, all the containers in that database share that throughput. If you specify throughput for a container, the container gets that throughput all to itself.
If you underprovision (by specifying too few RU/s), Cosmos DB will start throttling performance. Once throttling begins, requests will be asked to retry later when hopefully there are available resources to satisfy it. If an application makes too many attempts to retry a throttled request, the request could be aborted. The minimum throughput you can allocate to a database or container is 400 RU/s. You can increase and decrease the RU/s for a container at any time. Allocating more RU/s increases the cost. However, once you allocate throughput to a database or container, you'll be charged for the resources provisioned, whether you use them or not.
Note
If you applied the Free Tier Discount to your Cosmos DB account, you get the first 400 RU/s for a single database or container for free. 400 RU/s is enough capacity for most small to moderate databases.
The next video shows how to use the Azure portal to create a database and container:
If you prefer to use the Azure CLI or Azure PowerShell, you can run the following commands to create documents and containers. The code below shows some examples:
## Azure CLI - create a database
az cosmosdb sql database create \
--account-name <cosmos-db-account-name> \
--name <database-name> \
--resource-group <resource-group-name> \
--subscription <your-subscription> \
--throughput <number-of-RU/s>
## Azure CLI - create a container
az cosmosdb sql container create \
--account-name <cosmos-db-account-name> \
--database-name <database-name> \
--name <container-name> \
--resource-group <resource-group-name> \
--partition-key-path <key-field-in-documents>
## Azure PowerShell - create a database
New-AzCosmosDBSqlDatabase `
-ResourceGroupName "<resource-group-name>" `
-AccountName "<cosmos-db-account-name>" `
-Name "<database-name>" `
-Throughput <number-of-RU/s>
## Azure PowerShell - create a container
New-AzCosmosDBSqlContainer `
-ResourceGroupName "<resource-group-name>" `
-AccountName "<cosmos-db-account-name>" `
-DatabaseName "<database-name>" `
-Name "<container-name>" `
-PartitionKeyKind Hash `
-PartitionKeyPath "<key-field-in-documents>"
Provision other non-relational data services
100 XPBesides Cosmos DB, Azure supports other non-relational data services. These services are optimized for more specific cases than a generalized document database store.
In the sample scenario, Contoso wants to use Azure Blob storage to store video and audio files, Azure Data Lake storage to support large volumes of data, and Azure File storage to create file shares.
This unit describes how to provision Data Lake storage, Blob storage, and File Storage. As with Cosmos DB, you can provision these services using the Azure portal, the Azure CLI, Azure PowerShell, and Azure Resource Manager templates. Data Lake storage, Blob storage, and File Storage, all require that you first create an Azure storage account.
How to create a storage account
Use the Azure portal
Use the Create storage account page to set up a new storage account using the Azure portal.
On the Basics tab, provide for the following details:
Subscription. Select your Azure subscription.
Resource Group. Either select an existing resource group, or create a new one, as appropriate.
Storage account name. As with a Cosmos DB account, each storage account must have a unique name that hasn't already been used by someone else.
Location. Select the region that is nearest to you if you're in the process of developing a new application, or the region nearest to your users if you're deploying an existing application.
Performance. This setting has two options:
Standard storage accounts are based on hard disks. They're the lowest cost of the two storage options, but have higher latency. This type of storage account is suitable for applications that require bulk storage that is accessed infrequently, such as archives.
Premium storage uses solid-state drives, and has much lower latency and better read/write performance than standard storage. Solid-state drives are best used for I/O intensive applications, such as databases. You can also use premium storage to hold Azure virtual machine disks. A premium storage account is more expensive than a standard account.
Note
Data Lake storage is only available with a standard storage account, not premium.
Account kind. Azure storage supports several different types of account:
General-purpose v2. You can use this type of storage account for blobs, files, queues, and tables, and is recommended for most scenarios that require Azure Storage. If you want to provision Azure Data Lake Storage, you should specify this account type.
General-purpose v1. This is a legacy account type for blobs, files, queues, and tables. Use general-purpose v2 accounts when possible.
BlockBlobStorage. The type of storage account is only available for premium accounts. You use this account type for block blobs and append blobs. It's recommended for scenarios with high transaction rates, or that use smaller objects, or require consistently low storage latency.
FileStorage. This type is also only available for premium accounts. You use it to create files-only storage accounts with premium performance characteristics. It's recommended for enterprise or high-performance scale applications. Use this type if you're creating an account to support File Storage.
BlobStorage. This is another legacy account type that can only hold blobs. Use general-purpose v2 accounts instead, when possible. You can use this account type for Azure Data Lake storage, but the General-purpose v2 account type is preferable.
Replication. Data in an Azure Storage account is always replicated three times in the region you specify as the primary location for the account. Azure Storage offers two options for how your data is replicated in the primary region:
Locally redundant storage (LRS) copies your data synchronously three times within a single physical location in the region. LRS is the least expensive replication option, but isn't recommended for applications requiring high availability.
Geo-redundant storage (GRS) copies your data synchronously three times within a single physical location in the primary region using LRS. It then copies your data asynchronously to a single physical location in the secondary region. This form of replication protects you against regional outages.
Read-access geo-redundant storage (RA-GRS) replication is an extension of GRS that provides direct read-only access to the data in the secondary location. In contrast, the GRS option doesn't expose the data in the secondary location, and it's only used to recover from a failure in the primary location. RA-GRS replication enables you to store a read-only copy of the data close to users that are located in a geographically distant location, helping to reduce read latency times.
Zone redundant storage (ZRS) Zone-redundant storage replicates your Azure Storage data synchronously across three Azure availability zones in the primary region. Each availability zone is a separate physical location with independent power, cooling, and networking. This is useful for applications requiring high availability.
Note
To maintain performance, premium storage accounts only support LRS replication. This is because replication is performed synchronously to maintain data integrity. Replicating data to a distant region can increase latency to the point at which any advantages of using premium storage are lost.
Access tier. This option is only available for standard storage accounts. You can select between Hot and Cool.
The hot access tier has higher storage costs than cool and archive tiers, but the lowest access costs. Example usage scenarios for the hot access tier include:
- Data that's in active use or expected to be accessed (read from and written to) frequently.
- Data that's staged for processing and eventual migration to the cool access tier.
The cool access tier has lower storage costs and higher access costs compared to hot storage. This tier is intended for data that will remain in the cool tier for at least 30 days. Example usage scenarios for the cool access tier include:
- Short-term backup and disaster recovery datasets.
- Older media content not viewed frequently anymore but is expected to be available immediately when accessed.
- Large data sets that need to be stored cost effectively while more data is being gathered for future processing. For example, long-term storage of scientific data, or raw telemetry data from a manufacturing facility.
Use the Azure CLI
If you're using the Azure CLI, run the az storage account command to create a new storage account. The example below summarizes the options available:
Azure CLIaz storage account create \
--name <storage-account-name> \
--resource-group <resource-group> \
--location <your-location> \
--sku <sku> \
--kind <kind> \
--access-tier <tier>
The sku is combination of the performance tier and replication options. It can be one of Premium_LRS, Premium_ZRS, Standard_GRS, Standard_GZRS, Standard_LRS, Standard_RAGRS, Standard_RAGZRS, or Standard_ZRS.
The kind parameter should be one of BlobStorage, BlockBlobStorage, FileStorage, Storage, or StorageV2.
The access-tier parameter can either be Cool or Hot.
Use Azure PowerShell
You use the New-AzStorageAccount PowerShell cmdlet to create a new storage account, as follows:
PowerShellNew-AzStorageAccount `
-Name "<storage-account-name>" `
-ResourceGroupName "<resource-group-name>" `
-Location "<your-location>" `
-SkuName "<sku>" `
-Kind "<kind>" `
-AccessTier "<tier>"
The values for SkuName, Kind, and AccessTier are the same as those in the Azure CLI command.
How to provision Data Lake storage in a storage account
Use the Azure portal
Important
If you're provisioning a Data Lake storage, you must specify the appropriate configuration settings when you create the storage account. You can't configure Data Lake storage after the storage account has been set up.
In the Azure portal, on the Advanced tab of the Create storage account page, in the Data Lake Storage Gen2 section, select Enabled for the Hierarchical namespace option.
After the storage account has been created, you can add one or more Data Lake Storage containers to the account. Each container supports a directory structure for storing Data Lake files.
Use the Azure CLI
Run the az storage account command with the enable-hierarchical-namespace parameter to create a new storage account that supports Data Lake Storage:
Azure CLIaz storage account create \
--name <storage-account-name> \
--resource-group <resource-group> \
--location <your-location> \
--sku <sku> \
--kind <kind> \
--access-tier <tier> \
--enable-hierarchical-namespace true
Use Azure PowerShell
Use the New-AzStorageAccount PowerShell cmdlet with the EnableHierarchicalNamespace parameter, as follows:
PowerShellNew-AzStorageAccount `
-Name "<storage-account-name>" `
-ResourceGroupName "<resource-group-name>" `
-Location "<your-location>" `
-SkuName "<sku>" `
-Kind "<kind>" `
-AccessTier "<tier>" `
-EnableHierarchicalNamespace $True
How to provision Blob storage in a storage account
Use the Azure portal
Blobs are stored in containers, and you create containers after you've created a storage account. In the Azure portal, you can add a container using the features on the Overview page for your storage account.
The Containers page enables you to create and manage containers. Each container must have a unique name within the storage account. You can also specify the access level. By default, data held in a container is only accessible by the container owner. You can set the access level to Blob to enable public read access to any blobs created in the container, or Container to allow read access to the entire contents of the container, including the ability to list all blobs. You can also configure role-based access control for a blob if you need a more granular level of security.
Once you've provisioned a container, your applications can upload blobs into the container.
Use the Azure CLI
The az storage container create command establishes a new blob container in a storage account.
Azure CLIaz storage container create \
--name <container-name> \
--account-name <storage-account-name> \
--public-access <access>
The public-access parameter can be blob, container, or off (for private access only).
Use Azure PowerShell
Use the New-AzStorageContainer cmdlet to add a container to a storage account. You must first retrieve a storage account object with the Get-AzStorageAccount cmdlet. The code below shows an example:
PowerShellGet-AzStorageAccount `
-ResourceGroupName "<resource-group>" `
-Name "<storage-account-name>" | New-AzStorageContainer `
-Name "<container-name>" `
-Permission <permission>
The Permission parameter accepts the values Blob, Container, or Off.
How to provision File storage in a storage account
Use the Azure portal
You provision File storage by creating one or more file shares in the storage account. In the Azure portal, select File shares on the Overview page for the account.
Using the File shares page, create a new file share. Give the file share a name, and optionally set a quota to limit the size of files on the share. The total size of all files across all file shares in a storage account can't exceed 5120 GB.
After you've created the file share, applications can read and write shared files using the file share.
Use the Azure CLI
The Azure CLI provides the az storage share create to create a new file share in a storage account:
Azure CLIaz storage share create \
--name <share-name> \
--account-name <storage-account-name>
Use Azure PowerShell
The New-AzStorageShare cmdlet creates a new file share in a storage account. You must retrieve the storage account details first.
PowerShellGet-AzStorageAccount `
-ResourceGroupName "<resource-group>" `
-Name "<storage-account-name>" |New-AzStorageShare `
-Name "<share-name>"
Describe configuring non-relational data services
100 XPAfter you've provisioned a resource, you'll often need to configure it to meet the needs of your applications and environment. For example, you might need to set up network access, or open a firewall port to enable your applications to connect to the resource.
In this unit, you'll learn how to enable network access to your resources, and how you can prevent accidental exposure of your resources to third parties. You'll see how to use authentication and access control to protect the data managed by your resources.
Configure connectivity and firewalls
The default connectivity for Azure Cosmos DB and Azure Storage is to enable access to the world at large. You can connect to these services from an on-premises network, the internet, or from within an Azure virtual network. Although this level of access sounds risky, most Azure services mitigate this risk by requiring authentication before granting access. Authentication is described later in this unit.
Note
An Azure Virtual Network is a representation of your own network in the cloud. A virtual network enables you to connect virtual machines and Azure services together, in much the same way that you might use a physical network on-premises. Azure ensures that each virtual network is isolated from other virtual networks created by other users, and from the Internet. Azure enables you to specify which machines (real and virtual), and services, are allowed to access resources on the virtual network, and which ports they can use.
Configure connectivity to virtual networks and on-premises computers
To restrict connectivity, use the Networking page for a service. To limit connectivity, choose Selected networks. Three further sections will appear, labeled Virtual Network, Firewall, and Exceptions.
In the Virtual networks section, you can specify which virtual networks are allowed to route traffic to the service. When you create items such as web applications and virtual machines, you can add them to a virtual network. If these applications and virtual machines require access to your resource, add the virtual network containing these items to the list of allowed networks.
If you need to connect to the service from an on-premises computer, in the Firewall section, add the IP address of the computer. This setting creates a firewall rule that allows traffic from that address to reach the service.
The Exceptions setting allows you to enable access to any other of your services created in your Azure subscription.
For detailed information read Configure Azure Storage firewalls and virtual networks.
The image below shows the Networking page for an Azure Storage account. Other services have the same, or similar, page.
Configure connectivity from private endpoints
Azure Private Endpoint is a network interface that connects you privately and securely to a service powered by Azure Private Link. Private Endpoint uses a private IP address from your VNet, effectively bringing the service into your VNet. The service could be an Azure service such as Azure Storage, Azure Cosmos DB, SQL, or your own Private Link Service. For detailed information, read What is Azure Private Endpoint?.
The Private endpoint connections page for a service allows you to specify which private endpoints, if any, are permitted access to your service. You can use the settings on this page, together with the Firewalls and virtual networks page, to completely lock down users and applications from accessing public endpoints to connect to your Cosmos DB account.
Configure authentication
https://www.microsoft.com/en-us/videoplayer/embed/RE4A94T?postJsllMsg=trueMany services include an access key that you can specify when you attempt to connect to the service. If you provide an incorrect key, you'll be denied access. The image below shows how to find the access key for an Azure Storage account; you select Access Keys under Settings on the main page for the account. Many other services allow you to view the access key in the same way from the Azure portal. If your key is compromised, you can generate a new access key.
Note
Azure services actually provide two keys, labeled key1 and key2. An application can use either key to connect to the service.
Any user or application that knows the access key for a resource can connect to that resource. However, access keys provide a rather coarse-grained level of authentication. Additionally, if you need to regenerate an access key (after accidental disclosure, for example), you may need to update all applications that connect using that key.
Azure Active Directory (Azure AD) provides superior security and ease of use over access key authorization. Microsoft recommends using Azure AD authorization when possible to minimize potential security vulnerabilities inherent in using access keys.
Azure AD is a separate Azure service. You add users and other security principals (such as an application) to a security domain managed by Azure AD. The following video describes how authentication works with Azure.
For detailed information on using Azure AD, visit the page What is Azure Active Directory? on the Microsoft website.
Configure access control
Azure AD enables you to specify who, or what, can access your resources. Access control defines what a user or application can do with your resources after they've been authenticated.
Access management for cloud resources is a critical function for any organization that is using the cloud. Azure role-based access control (Azure RBAC) helps you manage who has access to Azure resources, and what they can do with those resources. For example, using RBAC you could:
- Allow one user to manage virtual machines in a subscription and another user to manage virtual networks.
- Allow a database administrator group to manage SQL databases in a subscription.
- Allow a user to manage all resources in a resource group, such as virtual machines, websites, and subnets.
- Allow an application to access all resources in a resource group.
You control access to resources using Azure RBAC to create role assignments. A role assignment consists of three elements: a security principal, a role definition, and a scope.
A security principal is an object that represents a user, group, service, or managed identity that is requesting access to Azure resources.
A role definition, often abbreviated to role, is a collection of permissions. A role definition lists the operations that can be performed, such as read, write, and delete. Roles can be given high-level names, like owner, or specific names, like virtual machine reader. Azure includes several built-in roles that you can use, including:
Owner - Has full access to all resources including the right to delegate access to others.
Contributor - Can create and manage all types of Azure resources but can't grant access to others.
Reader- Can view existing Azure resources.
User Access Administrator - Lets you manage user access to Azure resources.
You can also create your own custom roles. For detailed information, see Create or update Azure custom roles using the Azure portal on the Microsoft website.
A scope lists the set of resources that the access applies to. When you assign a role, you can further limit the actions allowed by defining a scope. This is helpful if, for example, you want to make someone a Website Contributor, but only for one resource group.
You add role assignments to a resource in the Azure portal using the Access control (IAM) page. The Role assignments tab enables you to associate a role with a security principal, defining the level of access the role has to the resource. For further information, read Add or remove Azure role assignments using the Azure portal.
Configure security
Apart from authentication and authorization, many services provide additional protection through security.
Security implements threat protection and assessment. Threat protection tracks security incidents and alerts across your service. This intelligence monitors the service and detects unusual patterns of activity that could be harmful, or compromise the data managed by the service. Recommendations identifies potential security vulnerabilities and recommends actions to mitigate them.
The image below shows the Security page for Azure storage. The corresponding page for other non-relational services, such as Cosmos DB, is similar.
Next unit: Configure Azure Cosmos DB, and Azure Storage
Configure Azure Cosmos DB, and Azure Storage
100 XPApart from the general configuration settings applicable to many services, most services also have specific features that you can set up. For example, in the sample scenario, after you've provisioned a Cosmos DB account, you may need to configure replication, or database consistency settings.
In this unit, you'll look at specific configuration settings for Azure Cosmos DB and Azure Storage accounts.
Configure Cosmos DB
Configure replication
Azure Cosmos DB enables you to replicate the databases and containers in your account across multiple regions. When you initially provision an account, you can specify that you want to copy data to another region. You don't have control over which region is used as the next nearest region is automatically selected. The Replicate data globally page enables you to configure replication in more detail. You can replicate to multiple regions, and you select the regions to use. In this way, you can pick the regions that are closest to your consumers, to help minimize the latency of requests made by those consumers.
You can also use this page to configure automatic failover to help ensure high availability. If the databases in the primary region (the region in which you created the account) become unavailable, one of the replicated regions will take over processing and become the new primary region.
By default, only the region in which you created the account supports write operations; the replicas are all read-only. However, you can enable multi-region writes. Multi-region writes can cause conflicts though, if applications running in different regions modify the same data. In this case, the most recent write will overwrite changes made earlier when data is replicated, although you can write your own code to apply a different strategy.
Replication is asynchronous, so there's likely to be a lag between a change made in one region, and that change becoming visible in other regions.
Note
Each replica increases the cost of the Cosmos DB service. For example, if you replicate your account to two regions, your costs will be three times that of a non-replicated account.
Configure consistency
https://www.microsoft.com/en-us/videoplayer/embed/RE4AbG9?postJsllMsg=trueWithin a single region, Cosmos DB uses a cluster of servers. This approach helps to improve scalability and availability. A copy of all data is held in each server in the cluster. The following video explains how this works, and the effects it can have on consistency:
Cosmos DB enables you to specify how such inconsistencies should be handled. It provides the following options:
Eventual. This option is the least consistent. It's based on the situation just described. Changes won't be lost, they'll appear eventually, but they might not appear immediately. Additionally, if an application makes several changes, some of those changes might be immediately visible, but others might be delayed; changes could appear out of order.
Consistent Prefix. This option ensures that changes will appear in order, although there may be a delay before they become visible. In this period, applications may see old data.
Session. If an application makes a number of changes, they'll all be visible to that application, and in order. Other applications may see old data, although any changes will appear in order, as they did for the Consistent Prefix option. This form of consistency is sometimes known as read your own writes.
Bounded Staleness. There's a lag between writing and then reading the updated data. You specify this staleness either as a period of time, or number of previous versions the data will be inconsistent for.
Strong: In this case, all writes are only visible to clients after the changes are confirmed as written successfully to all replicas. This option is unavailable if you need to distribute your data across multiple global regions.
Eventual consistency provides the lowest latency and least consistency. Strong consistency results in the highest latency but also the greatest consistency. You should select a default consistency level that balances the performance and requirements of your applications.
You can change the default consistency for a Cosmos DB account using the Default consistency page in the Azure portal. Applications can override the default consistency level for individual read operations. However, they can't increase the consistency above that specified on this page; they can only decrease it.
Configure Storage accounts
General configuration
The Configuration page for a storage account enables you to modify some general settings of the account. You can:
Enable or disable secure communications with the service. By default, all requests and responses are encrypted by using the HTTPS protocol as they traverse the Internet. You can disable encryption if required, although this isn't recommended.
Switch the default access tier between Cool and Hot.
Change the way in which the account is replicated.
Enable or disable integration with Azure Active Directory Domain Services (Azure AD DS) for requests that access file shares.
Other options, such as the account kind and performance tier, are displayed on this page for information only; you can't change them.
Configure encryption
All data held in an Azure Storage account is automatically encrypted. By default, encryption is performed using keys managed and owned by Microsoft. If you prefer, you can provide your own encryption keys.
To use your own keys, add them to Azure Key Vault. You then provide the details of the vault and key, or the URI of the key in the vault. All new data will be encrypted as it's written. Existing data will be encrypted using a process running in the background; this process may take a little time.
Configure shared access signatures
You can use shared access signatures (SAS) to grant limited rights to resources in an Azure storage account for a specified time period. This feature enables applications to access resources such as blobs and files, without requiring that they're authenticated first. You should only use SAS for data that you intend to make public.
A SAS is a token that an application can use to connect to the resource. The application appends the token to the URL of the resource. The application can then send requests to read or write data using this URL and token.
You can create a token that grants temporary access to the entire service, containers in the service, or individual objects such as blobs and files.
Use the Shared access signature page in the Azure portal to generate SAS tokens. You specify the permissions (you could provide read-only access to a blob, for example), the period for which the SAS token is valid, and the IP address range of computers allowed to use the SAS token. The SAS token is encrypted using one of the access keys; you specify which key to use (key1 or key2).
Exercise: Provision non-relational Azure data services
100 XPEmail is required to activate a sandbox or lab
Your Microsoft account must be linked to a valid email to activate a sandbox or lab. Go to Microsoft Account Settings to link your email and try again.
For more information, please check the troubleshooting guidance page.
In the sample scenario, you've decided to create the following data stores:
- A Cosmos DB for holding information about the volume of items in stock. You need to store current and historic information about volume levels, so you can track how levels vary over time. The data is recorded daily.
- A Data Lake store for holding production and quality data.
- A blob container for holding images of the products the company manufactures.
- File storage for sharing reports.
In this exercise, you'll provision and configure the Cosmos DB account, and test it by creating a database, a container, and a sample document. You'll also provision an Azure Storage account that can provide blob, file, and Data Lake storage.
You'll perform this exercise using the Azure portal.
Note
Azure can take as little as 5 minutes or as long as 20 minutes to create the Azure Cosmos DB account.
Provision and configure a Cosmos DB database and container
Create a Cosmos DB account
Sign in to the Azure portal.
From the left-hand navigation menu in the Azure portal, select Create a resource.
On the New page, select Azure Cosmos DB.
On the Select API option page, select Core (SQL) - Recommended
On the Create Azure Cosmos DB Account page, on the Basics tabs, enter the details of the account using the values in the following table, and then select Review + create:
Wait while your settings are validated. If there's a problem, it will be reported at this stage, and you can go back and correct the issue.
Select Create. It can take 10 or 15 minutes to create the account.
Create a database and a container
In the Azure portal, in the left-hand navigation menu, select All resources, and then select your Cosmos DB account.
On the page for your Cosmos DB account, select Data Explorer.
On the Data Explorer page, select New Container.
In the Add Container dialog box, create a new container with the following values, and then select OK:
In the Data Explorer window. Expand contosodb, expand productvolumes, and then click Items. The container should currently be empty.
Select New Item to create a new document.
Replace the text that appears in the document window with the following JSON document. This is an example document showing the amount of product 99 in stock on 01/01/2020.
JSON{
"productid": 99,
"date": "01/01/2020",
"in-stock": 500
}
Select Save. The document will be added to the container. The new document will have some additional fields that Cosmos DB uses to track and manage the document. You can ignore these fields for now.
You've now provisioned a new Cosmos DB account, and created a database and container.
Provision Azure Storage
Create an Azure Storage account for Data Lake Storage
On the left-hand navigation menu in the Azure portal, select Create a resource.
On the New page, select Storage account.
On the Create storage account page, on the Basics tabs, enter the details of the account using the values in the following table:
Select Advanced. On the Advanced page, in the Data Lake Storage Gen2 section, select Enabled, and then select Review + create.
If your settings are validated correctly, select Create.
It takes approximately 15-20 seconds for the storage account to be provisioned.
Create a container for Data Lake storage
In the Azure portal, on the left-hand navigation menu, select All resources, and then select your storage account.
On the page for your storage account, under Data Storage, select Containers.
On the Containers page, select + Container, and create a new container named productqualitydata. Leave the Public access level set to Private (no anonymous access), and then click Create.
When the container has been created, double-click the productqualitydata container.
On the productqualitydata page, click + Add Directory, and add a directory named plantA and click Save.
Add a second directory named plantB and click Save.
Contoso has two manufacturing plants named Plant A and Plant B. Other applications will upload manufacturing data from each of these plants to the appropriate directory for later analysis.
Create a container for Blob storage
In the Azure portal, on the left-hand navigation menu, select All resources, and then select your storage account.
On the Overview page, select Containers.
On the Containers page, select + Container, and create a new container named images. Set the Public access level to Blob (anonymous read access for blobs only).
Contoso will use this container to hold product images.
Note
The container created for Data Lake Storage will also appear in the Containers page. You could store image data in a Data Lake Storage container, but Contoso want to keep the images separate from product quality data.
Create a file share
On the storage account page, under Data storage select File shares.
On the File shares page, select + File share.
Create a new file share named reports. Leave the Tier as Transaction optimized.
On the File shares page, double-click the reports file share.
On the reports page, select + Add directory, and add a directory named manufacturing.
Add a second directory named complaints.
Contoso will use these directories to hold documents relating to the manufacturing process and customers' complaints. A user that has been granted access to the reports file share can upload and download files from these directories.
Manage non-relational data stores in Azure----------------
Introduction
100 XPNon-relational data stores can take many forms. Azure enables you to create non-relational databases using Azure Cosmos DB. Cosmos DB supports several NoSQL models, including document stores, graph databases, key-value stores, and column family databases. Other non-relational stores available in Azure include Azure Storage, which you can use to store blobs and files. In this module, you'll learn how to use these various storage services to store and retrieve data.
Suppose you're a data engineer working at Contoso, an organization with a large manufacturing operation. The organization has to gather and store information from a range of sources, such as real-time data monitoring the status of production line machinery, product quality control data, historical production logs, product volumes in stock, and raw materials inventory data. This information is critical to the operation of the organization. Contoso has created stores for holding this information. You've been asked to upload data to these stores, and investigate how to query this data using the features provided by Azure.
Learning objectives
In this module, you will:
- Upload data to a Cosmos DB database, and learn how to query this data.
- Upload and download data in an Azure Storage account.
Manage Azure Cosmos DB
100 XPAzure Cosmos DB is a NoSQL database management system. It's compatible with some existing NoSQL systems, including MongoDB and Cassandra. In the Contoso scenario, you've created a Cosmos DB database for holding information about the quantity of items in stock. You now need to understand how to populate this database, and how to query it.
In this unit, you'll review how Cosmos DB stores data. Then you'll learn how to upload data to a Cosmos DB database, and configure Cosmos DB to support bulk loading.
What is Azure Cosmos DB?
Cosmos DB manages data as set of documents. A document is a collection of fields, identified by a key. The fields in each document can vary, and a field can contain child documents. Cosmos DB uses JSON (JavaScript Object Notation) to represent the document structure. In this format, the fields in a document are enclosed between braces, { and }, and each field is prefixed with its name. The example below shows a pair of documents representing customer information. In both cases, each customer document includes child documents containing the name and address, but the fields in these child documents vary between customers.
JSON## Document 1 ##
{
"customerID": "103248",
"name":
{
"first": "AAA",
"last": "BBB"
},
"address":
{
"street": "Main Street",
"number": "101",
"city": "Acity",
"state": "NY"
},
"ccOnFile": "yes",
"firstOrder": "02/28/2003"
}
## Document 2 ##
{
"customerID": "55151",
"name":
{
"title": "Mr",
"forename": "DDD",
"lastname": "EEE"
},
"address":
{
"street": "Another Street",
"number": "202",
"city": "Bcity",
"county": "Gloucestershire",
"country-region": "UK"
},
"ccOnFile": "yes"
}
Documents in a Cosmos DB database are organized into containers. The documents in a container are grouped together into partitions. A partition holds a set of documents that share a common partition key. You designate one of the fields in your documents as the partition key. Select a partition key that collects all related documents together. This approach helps to reduce the amount of disk read operations that queries use when retrieving a set of documents for a given entity. For example, in a document database for an ecommerce system recording the details of customers and the orders they've placed, you could partition the data by customer ID, and store the customer and order details for each customer in the same partition. To find all the information and orders for a customer, you simply need to query that single partition:
Cosmos DB is a foundational service in Azure. Cosmos DB is used by many of Microsoft's products for mission critical applications running at global scale, including Skype, Xbox, Microsoft 365, and Azure. Cosmos DB is highly suitable for IoT and telematics, Retail and marketing, Gaming, and Web and mobile applications. For additional information about uses for Cosmos DB, read Common Azure Cosmos DB use cases.
What are Cosmos DB APIs?
You access the data in a Cosmos DB database through a set of commands and operations, collectively known as an API, or Application Programming Interface. Cosmos DB provides its own native API, called the SQL API. This API provides a SQL-like query language over documents, that enables you to retrieve documents using SELECT statements. The example below finds the address for customer 103248 in the documents shown above:
SQLSELECT c.address
FROM customers c
WHERE c.customerID = "103248"
Cosmos DB also provides other APIs that enable you to access these documents using the command sets of other NoSQL database management systems. These APIs are:
Table API. This interface enables you to use the Azure Table Storage API to store and retrieve documents. The purpose of this interface is to enable you to switch from Table Storage to Cosmos DB without requiring that you modify your existing applications.
MongoDB API. MongoDB is another well-known document database, with its own programmatic interface. Many organizations use on-premises. You can use the MongoDB API for Cosmos DB to enable a MongoDB application to run unchanged against a Cosmos DB database. You can migrate the data in the MongoDB database to Cosmos DB running in the cloud, but continue to run your existing applications to access this data.
Cassandra API. Cassandra is a column family database management system. This is another database management system that many organizations run on-premises. The Cassandra API for Cosmos DB provides a Cassandra-like programmatic interface for Cosmos DB. Cassandra API requests are mapped to Cosmos DB document requests. As with the MongoDB API, the primary purpose of the Cassandra API is to enable you to quickly migrate Cassandra databases and applications to Cosmos DB.
Gremlin API. The Gremlin API implements a graph database interface to Cosmos DB. A graph is a collection of data objects and directed relationships. Data is still held as a set of documents in Cosmos DB, but the Gremlin API enables you to perform graph queries over the data. Using the Gremlin API you can walk through the objects and relationships in the graph to discover all manner of complex relationships, such as "What is the name of the pet of Sam's landlord?" in the graph shown below.
The principal use of the Table, MongoDB, and Cassandra APIs is to support existing applications written using these data stores. If you're building a new application and database, you should use the SQL API or Gremlin API.
Perform data operations in Cosmos DB
Cosmos DB provides several options for uploading data to a Cosmos DB database, and querying that data. You can:
- Use Data Explorer in the Azure portal to run ad-hoc queries. You can also use this tool to load data, but you can only load one document at a time. The data load functionality is primarily aimed at uploading a small number of documents (up to 2 MB in total size) for test purposes, rather than importing large quantities of data.
- Use the Cosmos DB Data Migration tool to perform a bulk-load or transfer of data from another data source.
- Use Azure Data Factory to import data from another source.
- Write a custom application that imports data using the Cosmos DB BulkExecutor library. This strategy is beyond the scope of this module.
- Create your own application that uses the functions available through the Cosmos DB SQL API client library to store data. This approach is also beyond the scope of this module.
Load data using the Cosmos DB Data Migration tool
You can use the Data Migration tool to import data to Azure Cosmos DB from a variety of sources, including:
- JSON files
- MongoDB
- SQL Server
- CSV files
- Azure Table storage
- Amazon DynamoDB
- HBase
- Azure Cosmos containers
The Data Migration tool is available as a download from GitHub. The tool guides you through the process of migrating data into a Cosmos DB database. You're prompted for the source of the data (one of the items listed above), and the destination (the Cosmos DB database and container). The tool can either populate an existing container, or create a new one if the specified container doesn't already exist.
Note
You can also use the Data Migration tool to export data from a Cosmos DB container to a JSON file, either held locally or in Azure Blob storage
Configure Cosmos DB to support bulk loading
If you have a large amount of data, the Data Migration Tool can make use of multiple concurrent threads to batch your data into chunks and load the chunks in parallel. Each thread acts as a separate client connection to the database. Bulk loading can become a write-intensive task.
When you upload data to a container, if you have insufficient throughput capacity configured to support the volume of write operations occurring concurrently, some of the upload requests will fail. Cosmos DB reports an HTTP 429 error (Request rate is large). Therefore, if you're planning on performing a large data import, you should increase the throughput resources available to the target Cosmos container. If you're using the Data Migration Tool to create the container as well as populate it, the Target information page enables you to specify the throughput resources to allocate.
If you've already created the container, use the Scale settings of the database in the Data Explorer page for your database in the Azure portal to specify the maximum throughput, or set the throughput to Autoscale.
Once the data has been loaded, you may be able to reduce the throughput resources to lower the costs of the database.
Query Azure Cosmos DB
100 XPAlthough Azure Cosmos DB is described as a NoSQL database management system, the SQL API enables you to run SQL-like queries against Cosmos DB databases. These queries use a syntax similar to that of SQL, but there are some differences. This is because the data in a Cosmos DB is structured as documents rather than tables.
In this lesson, you'll learn about the dialect of SQL implemented by the SQL API. You'll see how to use the Data Explorer in the Azure portal to run queries.
Use the SQL API to query documents
The Cosmos DB SQL API supports a dialect of SQL for querying documents using SELECT statements that will be familiar if you have written SELECT statements in a relational database using an ANSI SQL compliant database engine. The SQL API returns results in the form of JSON documents. All queries are executed in the context of a single container.
Understand a SQL API query
A SQL API SELECT query includes the following clauses:
SELECT clause. The clause starts with the keyword SELECT followed by a comma-separated list of properties to return. The keyword “*” means all the properties in the document.
FROM clause. This clause starts with the keyword FROM followed by an identifier, representing the source of the records, and an alias that you can use for this identifier in other clauses (the alias is optional). In a relational database query, the FROM clause would contain a table name. In the SQL API, all queries are limited to the scope of a container, so the identifier represents the name of the container.
WHERE clause. This clause is optional. It starts with the keyword WHERE followed by one or more logical conditions that must be satisfied by a document returned by the query. You use the WHERE clause to filter the results of a query.
ORDER BY clause. This clause is also optional. It starts with the phrase ORDER BY followed by one or more properties used to order the output result set.
Note
A query can also contain a JOIN clause. In a relational database management system, such as Azure SQL Database, JOIN clauses are used to connect data from different tables. In the SQL API, you use JOIN clauses to connect fields in a document with fields in a subdocument that is part of the same document. You can't perform joins across different documents.
The examples below show some simple queries:
SQL// Simple SELECT. The identifier "c" is an alias for the container being queried
SELECT c.*
FROM customers c
// Projection - limit the output to specified fields
SELECT c.Title, c.Name
FROM customers c
// Projection - Address is a subdocument that contains fields named "state" and "city", amongst others
SELECT c.Name, c.Address.State, c.Address.City
FROM customers c
// Filter that limits documents to customers living in California
SELECT c.Name, c.Address.City
FROM customers c
WHERE c.Address.State = "CA"
// Retrieve customers living in California in Name order
SELECT c.Name, c.Address.City
FROM customers c
WHERE c.Address.State = "CA"
ORDER BY c.Name
Understand supported operators
The SQL API includes many common mathematical and string operations, in addition to functions for working with arrays and for checking data types. The operators supported in SQL API queries include:
The SQL API also supports:
The DISTINCT operator that you use as part of the SELECT clause to eliminate duplicates in the result data.
The TOP operator that you can use to retrieve only the first few rows returned by a query that might otherwise generate a large result set.
The BETWEEN operation that you use as part of the WHERE clause to define an inclusive range of values. The condition field BETWEEN a AND b is equivalent to the condition field >= a AND field <= b
.
The IS_DEFINED operator that you can use for detecting whether a specified field exists in a document.
The query below shows some examples using these operators.
SQL// List all customer cities (remove duplicates) for customers living in states with codes between AK (Alaska) and MD (Maryland)
SELECT DISTINCT c.Address.City
FROM c
WHERE c.Address.State BETWEEN "AK" AND "MD"
// Find the 3 most common customer names
SELECT TOP 3 *
FROM c
ORDER BY c.Name
// Display the details of every customer for which the data of birth is recorded
SELECT * FROM p
WHERE IS_DEFINED(p.DateOfBirth)
Understand aggregate functions
You can use aggregate functions to summarize data in SELECT queries; you place aggregate functions in the SELECT clause. The SQL API query language supports the following aggregate functions:
COUNT(p). This function returns a count of the number of instances of field p in the result set. To count all the items in the result set, set p to a scalar value, such as 1.
SUM(p). This function returns the sum of all the instances of field p in the result set. The values of p must be numeric.
AVG(p). This function returns the mathematical mean of all the instances of field p in the result set. The values of p must be numeric.
MAX(p). This function returns the maximum value of field p in the result set.
MIN(p). This function returns the minimum value of field p in the result set.
Although the syntax of aggregate functions is similar to ANSI SQL, unlike ANSI SQL the SQL API query language doesn't support the GROUP BY clause; you can't generate subtotals for different values of the same field in a single query. You're able to include more than one aggregate function in the SELECT clause of your queries.
In the following example, the query returns the average, maximum, and sum of the age field of the documents in a collection, in addition to a count of all the documents in the collection:
SQLSELECT AVG(c.age) AS avg,
MAX(c.age) AS max,
SUM(c.age) AS sum,
COUNT(1) AS count
FROM c
The SQL API also supports a large number of mathematical, trigonometric, string, array, and spatial functions. For detailed information on the syntax of queries, and the functions and operators supported by the Cosmos DB SQL API, visit the page Getting started with SQL queries in Azure Cosmos DB on the Microsoft website.
Query documents with the SQL API using Data Explorer
You can use Data Explorer in the Azure portal to create and run queries against a Cosmos DB container. The Items page for a container provides the New SQL Query command in the toolbar:
In the query pane that appears, you can enter a SQL query. Select Execute Query to run it. The results will be displayed as a list of JSON documents
You can save the query text if you need to repeat it in the future. The query is saved in a separate container. You can retrieve it later using the Open Query command in the toolbar.
Note
The Items page also lets you modify and delete documents. Select a document from the list to display it in the main pane. You can modify any of the fields, and select Update to save the changes. Select Delete to remove the document from the collection. The New Item command enables you to manually add a new document to the collection. You can use the Upload Item to create new documents from a file containing JSON data.
Manage Azure Blob storage
100 XPAzure Blob storage is a suitable repository for holding large binary objects, such as images, video, and audio files. In the Contoso scenario, you've created a blob container for holding images of the products the company manufactures.
Azure currently supports three different types of blobs; Block blobs, Page blobs, and Append blobs. You typically use page blobs to implement virtual disk storage for Azure virtual machines; they're optimized to support random read and write operations. Append blobs are suitable for storing data that grows in chunks, such as logs or other archive data. Block blobs are best for static data, and are the most appropriate type of storage for holding the image data held by Contoso.
In this unit, you'll learn how to create and manage blobs, and the containers that hold them.
Note
This unit concentrates on using the Azure portal, the Azure CLI, and Azure PowerShell for managing blobs and blob storage. You can also use the AzCopy utility to upload and download files, including blobs. The next unit describes how to use AzCopy.
Create an Azure Storage container
In an Azure storage account, you store blobs in containers. A container provides a convenient way of grouping related blobs together, and you can organize blobs in a hierarchy of folders inside a container, similar to files in a file system on disk.
You create a container in an Azure Storage account. You can do this using the Azure portal, or using the Azure CLI or Azure PowerShell from the command line.
Use the Azure portal
In the Azure portal, go to the Overview page for your Azure Storage account, and select Containers.
On the Containers page, select + Container, and provide a name for the new container. You can also specify the public access level. For a container that will be used to hold blobs, the most appropriate access level is Blob. This setting supports anonymous read-only access for blobs. However, unauthenticated clients can't list the blobs in the container. This means they can only download a blob if they know its name and location within the container.
Use the Azure CLI
If you prefer to use the Azure CLI, the az storage container create
command creates a new container. This command takes a number of optional parameters, and you can find the full details on the az storage container create page on the Microsoft website. The example below creates a container named images for storing blobs. The container is created in a storage account named contosodata. The container provides anonymous blob access.
Azure CLIaz storage container create \
--name images \
--account-name contosodata \
--resource-group contoso-group \
--public-access blob
Use Azure PowerShell
You can use the New-AzStorageContainer
PowerShell cmdlet to create a new storage container. The details are available on the New-AzStorageContainer page on the Microsoft website. You must first obtain a reference to the storage account using the Get-AzStorageAccount
command. The code below shows an example:
PowerShellGet-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | New-AzStorageContainer `
-Name "images" `
-Permission Blob
Upload a blob to Azure Storage
After you've created a container, you can upload blobs. Depending on how you want to organize your blobs, you can also create folders in the container.
Use the Azure portal
If you're using the Azure portal, go to the page for your storage account and select Containers under Blob service. On the Containers page, select the container you want to use.
Note
If you created the storage account with support for hierarchical namespaces (for Data Lake Storage), the Blob service section doesn't appear in the Azure portal. Instead, select Containers under Data Lake Storage.
On the page for the container, in the toolbar, select Upload. In the Upload blob dialog box, browse to the file container the data to upload. The Advanced drop-down section provides options you can modify the default options. For example, you can specify the name of a folder in the container (the folder will be created if it doesn't exist), the type of blob, and the access tier. The blob that is created is named after the file you uploaded.
Note
You can select multiple files. They will each be uploaded into seperate blobs.
Use the Azure CLI
Use the az storage blob upload
command to upload a file to a blob in a container. The details describing the parameters for this command are available on the az storage blob upload page on the Microsoft website. The following example uploads a local file named racer_green_large.gif in the data folder to a blob called racer_green in the *bikes folder in the images container in the contosodata storage account.
Azure CLIaz storage blob upload \
--container-name images \
--account-name contosodata \
--file "\data\racer_green_large.gif" \
--name "bikes\racer_green"
If you need to upload several files, use the az storage blob upload-batch
command. This command takes the name of a local folder rather than a file name, and uploads the files in that folder to separate blobs. The example below uploads all gif files in the data folder to the bikes folder in the images container.
Azure CLIaz storage blob upload-batch \
--account-name contosodata \
--source "\data" \
--pattern "*.gif" \
--destination "images\bikes"
Use Azure PowerShell
Azure PowerShell provides the Set-AzStorageBlobContent cmdlet to upload blob data to Azure storage, as follows:
PowerShellGet-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | Set-AzStorageBlobContent `
-Container "images" `
-File "\data\racer_green_large.gif" `
-Blob "bikes\racer_green"
Azure PowerShell doesn't currently include a batch blob upload command. If you need to upload multiple files, you can write your own PowerShell script (use the Get-ChildItem
cmdlet) to iterate through the files and upload each one individually.
List the blobs in a container
If you've been granted the appropriate access rights, you can view the blobs in a container.
Use the Azure portal
If you're using the Azure portal, go to the page for your storage account and select Containers under Blob service. On the Containers page, select the container holding your blobs. If the container has a folder structure, move to the folder containing the blobs you want to see. The blobs in that folder should be displayed.
Use the Azure CLI
In the Azure CLI, you can use the az storage blob list command to view the blobs in a container. This command iterates recursively through any folders in the container. The example below lists the blobs previously uploaded to the images container:
Azure CLIaz storage blob list \
--account-name contosodata \
--container-name "images"
Use Azure PowerShell
From Azure PowerShell, run the Get-AzStorageBlob cmdlet, as illustrated in the following example:
PowerShellGet-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | Get-AzStorageBlob `
-Container "images"
Download a blob from a container
You can retrieve a blob from Azure Storage and save it in a local file on your computer.
Use the Azure portal
If you're using the Azure portal, go to the page for your storage account and select Containers under Blob service. On the Containers page, select the container holding your blobs. If the container has a folder structure, move to the folder containing the blobs you want to download. Select the blob to view its details. On the details page, select Download.
Use the Azure CLI
The Azure CLI provides the az storage blob download and az storage blob download-batch commands. These commands are analogous to those available for uploading blobs. The example below retrieves the racer_green" blob from the bikes folder in the images container.
Azure CLIaz storage blob download \
--container-name images \
--account-name contosodata \
--file "racer_green_large.gif" \
--name "bikes\racer_green"
Use Azure PowerShell
In Azure PowerShell, use the Get-AzStorageBlobContent cmdlet.
PowerShellGet-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | Get-AzStorageBlobContent `
-Container "images" `
-Blob "bikes\racer_green_large.gif" `
-Destination "racer_green_large.gif"
Delete a blob from a container
Deleting a blob can reclaim the resources used in the storage container. However, if you've enabled the soft delete option for the storage account, the blob is hidden rather than removed, and you can restore it later. You can enable or disable soft delete in the Azure portal, and specify the time for which the blob is retained. Select the Data protection page under Blob service. If the blob isn't restored by the end of the retention period, it will be removed from storage.
Warning
If you created the storage account with support for hierarchical namespaces (for Data Lake Storage), the soft delete option isn't available. All blob delete operations will be final.
Use the Azure portal
If you're using the Azure portal, go to the page for your storage account and select Containers under Blob service. On the Containers page, select the container holding your blobs. If the container has a folder structure, move to the folder containing the blobs you want to download. Select the blob to view its details. On the details page, select Delete. You'll be prompted to confirm the operation.
If you've enabled soft delete for the storage account, the blobs page listing the blobs in a container includes the option Show deleted blobs. If you select this option, you can view and undelete a deleted blob.
Use the Azure CLI
You can delete a single blob with the az storage blob delete command, or a set of blobs with the az storage blob delete-batch command. The command below removes the racer-green blob from the bikes folder in the images container:
Azure CLIaz storage blob delete ^
--account-name contosodata ^
--container-name "images" ^
--name "bikes\racer_green"
Use Azure PowerShell
Use the Remove-AzStorageBlob cmdlet to delete a storage blob from Azure PowerShell. By default, deletion is silent. You can add the -Confirm
flag to prompt the user to confirm that they really want to delete the blob:
PowerShellGet-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | Remove-AzStorageBlob `
-Container "images" `
-Blob "bikes\racer_green" `
-Confirm
Delete an Azure Storage container
Removing a container automatically deletes all blobs held in that container. If you aren't careful, you can lose a great deal of data.
Use the Azure portal
In the Azure portal, select Containers under Blob service, select the container to delete, and then select Delete in the toolbar.
Use the Azure CLI
In the Azure CLI, use the az storage container delete command. The following example deletes the images container referenced in previous examples.
azureaz storage container delete \
--account-name contosodata \
--name "images"
Use Azure PowerShell
The Remove-AzStorageContainer cmdlet deletes a storage container. The -Confirm
flag prompts the user to confirm the delete operation. The code below shows an example:
PowerShellGet-AzStorageAccount `
-ResourceGroupName "contoso-group" `
-Name "contosodata" | Remove-AzStorageContainer `
-Name "images" `
-Confirm
Manage Azure File storage
100 XPYou can use Azure File storage to store shared files. Users can connect to a shared folder (also known as a file share) and read and write files (if they have the appropriate privileges) in much the same way as they would use a folder on a local machine. In the Contoso scenario, Azure File storage is used to hold reports and product documentation that users across the company need to be able to read.
In this unit, you'll learn how to create and manage file shares, and upload and download files in Azure File storage.
Note
Files in a file share tend to be handled in a different manner from blobs. In many cases, users simply read and write files as though they were local objects. For this reason, although the Azure CLI and Azure PowerShell both provide programmatic access to Azure File storage, this unit concentrates on the the tools available in the Azure portal, and the AzCopy command.
Create a file share
Microsoft provides two graphical tools you can use to create and manage file shares in Azure Storage: the Azure portal, and Azure Storage Explorer.
Use the Azure portal
Select File shares in the main pane of the Overview page for an Azure Storage account, and also available in the File service section of the command bar:
On the File shares page, select + File share. Give the file share a name, and optionally specify a quota. Azure allows you to store up to 5 PiB of files across all files shares in a storage account. A quota enables you to limit the amount of space an individual file share consumes, to prevent it from starving other file shares of file storage. If you have only one file share, you can leave the quota empty.
After you've created a share, you can use the Azure portal to add directories to the share, upload files to the share, and delete the share. The Connect command generates a PowerShell script that you can run to attach to the share from your local computer. You can then use the share as though it was a local disk drive.
Use Azure Storage Explorer
Azure Storage Explorer is a utility that enables you to manage Azure Storage accounts from your desktop computer. You can download it from the Azure Storage Explorer page on the Microsoft website. You can use Storage Explorer to create blob containers and file shares, as well as upload and download files.
A version of this utility is also available in the Azure portal, on the Overview page for an Azure Storage account.
To create a new file share, right-click File Shares, and then select Create file share. In the Azure portal, Storage Explorer displays the same dialog box that you saw earlier. In the desktop version, you simply enter a name for the new file share; you don't get the option to set a quota at this point.
As with the Azure portal, once you have created a new share, you can use Storage Explorer to create folders, and upload and download files.
Upload and download files
You can upload and download individual files to and from Azure File storage manually, by using Storage Explorer, the Azure portal, or by connecting the file share to your desktop computer and dragging and dropping files in File Explorer.
However, if you need to transfer a significant number of files in and out of Azure File storage, you should use the AzCopy utility. AzCopy is a command-line utility optimized for transferring large files (and blobs) between your local computer and Azure File storage. It can detect transfer failures, and restart a failed transfer at the point an error occurred - you don't have to repeat the entire operation.
Generate an SAS token
Before you can use AzCopy, you generate a Shared access signature (SAS) token. A SAS token provides controlled, time-limited, anonymous access to services and resources in a storage account; users don't have to provide any additional credentials. SAS tokens are useful in situations where you don't know in advance which users will require access to your resources.
Note
The AzCopy command also supports authentication using Azure Active Directory, but this approach requires adding all of your users to Azure Active Directory first.
You can create an SAS token for connecting to Azure File storage using the Azure portal. On the page for your storage account, under Settings, select Shared access signature. On the Shared access signature page, under Allowed services, select File. Under Allowed resource types, select Container and Object. Under Permissions, select the privileges that you want to grant to users. Set the start and end time for the SAS token, and specify the IP address range of the computers your users will be using. Select Generate SAS and connection string to create the SAS token. Copy the value in the SAS token field somewhere safe.
Upload files
To transfer a single file into File Storage using AzCopy, use the form of the command shown in the following example. Run this command from the command line. In this example, replace <storage-account-name> with the name of the storage account, replace <file-share> with the name of a file share in this account, and replace <SAS-token> with the token you created using the Azure portal. You must include the quotes where shown.
Note
Don't forget to include the copy keyword after the azcopy command. AzCopy supports other operations, such as deleting files and blobs, listing files and blobs, and creating new file shares. Each of these operations has its own keyword.
Bashazcopy copy "myfile.txt" "https://<storage-account-name>.file.core.windows.net/<file-share-name>/myfile.txt<SAS-token>"
You can transfer the entire contents of a local folder to Azure File storage using a similar command. You replace the file name ("myfile.txt") with the name of the folder. If the folder contains subfolders that you want to copy, add the --recursive flag.
Bashazcopy copy "myfolder" "https://<storage-account-name>.file.core.windows.net/<file-share-name>/myfolder<SAS-token>" --recursive
As the process runs, AzCopy displays a progress report:
textINFO: Scanning...
INFO: Any empty folders will be processed, because source and destination both support folders
Job b86eeb8b-1f24-614e-6302-de066908d4a2 has started
Log file is located at: C:\Users\User\.azcopy\b86eeb8b-1f24-614e-6302-de066908d4a2.log
11.5 %, 126 Done, 0 Failed, 48 Pending, 0 Skipped, 174 Total, 2-sec Throughput (Mb/s): 8.2553
When the transfer is complete, you'll see a summary of the work performed.
textJob b86eeb8b-1f24-614e-6302-de066908d4a2 summary
Elapsed Time (Minutes): 0.6002
Number of File Transfers: 161
Number of Folder Property Transfers: 13
Total Number of Transfers: 174
Number of Transfers Completed: 174
Number of Transfers Failed: 0
Number of Transfers Skipped: 0
TotalBytesTransferred: 43686370
Final Job Status: Completed
The AzCopy copy command has other options as well. For more information, see the page Upload files on the Microsoft website.
Download files
You can also use the AzCopy copy command to transfer files and folders from Azure File Storage to your local computer. The command is similar to that for uploading files, except that you switch the order of the arguments; specify the files and folders in the file share first, and the local files and folders second. For example, to download the files from a folder named myfolder in a file share named myshare to a local folder called localfolder, use the following command:
Bashazcopy copy "https://<storage-account-name>.file.core.windows.net/myshare/myfolder<SAS-token>" "localfolder" --recursive
For full details on downloading files using AzCopy, see Download files.
Exercise: Upload, download, and query data in a non-relational data store
100 XPEmail is required to activate a sandbox or lab
Your Microsoft account must be linked to a valid email to activate a sandbox or lab. Go to Microsoft Account Settings to link your email and try again.
For more information, please check the troubleshooting guidance page.
In the sample scenario, suppose that you've created the following data stores:
- A Cosmos DB database for holding information about the products that Contoso manufactures.
- A blob container in Azure Storage for holding the images of products.
In this exercise, you'll run a script to upload data to these data stores. You'll perform queries against the data in the Cosmos DB database. Then, you'll download and view the images held in Azure Storage.
You'll perform this exercise using the Azure portal and the Azure Cloud Shell.
Setup
In the Cloud Shell window on the right, run the following command:
Bashgit clone https://github.com/MicrosoftLearning/DP-900T00A-Azure-Data-Fundamentals dp-900
This command copies the scripts and data required to set up the sample Cosmos DB database and Azure Storage account used by this exercise.
Move to the dp-900/nosql folder.
Bashcd dp-900/nosql
Run the following command.
Bashbash setup.sh
This command creates the Cosmos DB database and Azure Storage account and populates them with sample data. It takes as long as 10 minutes to run. When the script has finished, make a note of the values for the Cosmos DB account, database, container, and Storage account names.
Query product data in Cosmos DB
Sign in to the Azure portal.
On the Azure Home page, select the drop-down menu at the top of the left-hand pane, and then select All resources.
On the All resources page, select the Cosmos DB account that was created by the setup script. The account name will be cosmos followed by a random number:
On the Cosmos DB account page, select Data Explorer. On the Data Explorer page, expand the ProductData database, expand the ProductCatalog container, and then select Items. Verify that the Items pane contains a list of products.
Select the item with ID 316. A JSON document containing the details for product 316 should appear in the right-hand pane.
In the toolbar, select New SQL Query.
In the Query 1 pane, enter the following query, and then select Execute Query. This query returns the name, color, listprice, description, and file name of the image for each model of mountain bike that Contoso make. The query should return 32 documents.
SQLSELECT p.productname, p.color, p.listprice, p.description, p.images.thumbnail
FROM products p
WHERE p.productcategory.subcategory = "Mountain Bikes"
Modify the query to return information about Road Bikes, and then click Execute Query.
SQLSELECT p.productname, p.color, p.listprice, p.description, p.images.thumbnail
FROM products p
WHERE p.productcategory.subcategory = "Road Bikes"
The query should return 43 documents.
Replace the query with the following text. This query counts the number of Touring Bikes.
SQLSELECT COUNT(p.productname)
FROM products p
WHERE p.productcategory.subcategory = "Touring Bikes"
The data is returned as a document with a field named "$1" that has the value 22.
text[
{
"$1": 22
}
]
Modify the query, and add the VALUE keyword as shown below.
SQLSELECT VALUE COUNT(p.productname)
FROM products p
WHERE p.productcategory.subcategory = "Touring Bikes"
This time the query just returns the value 22, and doesn't generate a field name.
text[
22
]
Run the following query:
SQLSELECT VALUE SUM(p.quantityinstock)
FROM products p
WHERE p.productcategory.subcategory = "Touring Bikes"
This query returns the total number of touring bikes currently in stock. It should return the value 3477.
If you have time, experiment with some queries of your own.
View uploaded images in Azure Blob storage
In the Azure portal, in the left-hand navigation menu, select Home
On the Home page, select All resources, and then select the storage account created by the setup script.
On the storage account page, select Storage browser.
In the Storage browser pane, expand BLOB CONTAINERS and then click images. The Images blob contains the image files uploaded by the setup script.
Select any image , and then select Open in the toolbar.
In the File download window, select Click here to begin download.
The file should be downloaded by the browser. Select the file and open it to display the contents.
The image should be displayed. By default, Windows will use the Photo Viewer app, but if you have a different configuration then an alternative application might be used.
If time allows, try downloading and displaying other images.
Summary
In this exercise, you investigated using Cosmos DB and Azure Storage to store and retrieve data. You ran a script that created a Cosmos DB database and a storage account, and uploaded sample data. You used Data Explorer in Cosmos DB to run simple queries against the data. You used Storage browser for the storage account to browse blob storage and download files.
An awesome blog for the freshers. Thanks for posting this information.
ReplyDeleteSalesforce CPQ Online Training Hyderabad
Salesforce CPQ Online Training India