Tuesday, October 13, 2015

Microsoft Azure: Microsoft Azure Blob Storage:Benefits and Estimate blob Capacity:

Azure Storage

  • Microsoft Azure Storage is a massively scalable, highly available, and elastic cloud storage solution that empowers developers and IT professionals to build large-scale modern applications.
  •  Azure Storage is accessible from anywhere in the world, from any type of application, whether it’s running in the cloud, on the desktop, on an on-premises server, or on a mobile or tablet device.
  •  Azure Storage uses an auto-partitioning system that automatically load-balances your data based on traffic.
  •  Azure Storage also exposes data resources via simple REST APIs, which are available to any client capable of sending and receiving data via HTTP/HTTPS.


Azure Storage Services

The Azure Storage services are Blob storage, Table Storage, Queue Storage, and File Storage:

·         Blob Storage: stores file data. A blob can be any type of text or binary data, such as a document, media file, or application installer.

·         Table Storage: stores structured datasets. Table storage is a NoSQL key-attribute data store, which allows for rapid development and fast access to large quantities of data.

·         Queue Storage: provides reliable messaging for workflow processing and for communication between components of cloud services.


·         File Storage (Preview): offers shared storage for legacy applications using the standard SMB 2.1 protocol. Azure virtual machines and cloud services can share file data across application components via mounted shares, and on premise applications can access file data in a share via the File service REST API. File storage is available by request via the Azure Preview page.

Blob Storage
  •         Azure Blob storage is a service for storing large amounts of unstructured data, such as text or binary       data, that can be accessed from anywhere in the world via HTTP or HTTPS.
  •         We can use Blob storage to expose data publicly to the world, or to store application data privately.
  •         Common uses of Blob storage include: 
ü  Serving images or documents directly to a browser
ü  Storing files for distributed access
ü  Streaming video and audio
ü  Performing secure backup and disaster recovery
ü  Storing data for analysis by an on-premises or Azure-hosted service.

Blob Service Concepts

The Blob service contains the following components:

·         Storage Account: All access to Azure Storage is done through a storage account.
·         Container: A container provides a grouping of a set of blobs. All blobs must be in a container. An account can contain an unlimited number of containers. A container can store an unlimited number of blobs.
·         Blob: A file of any type and size. There are two types of blobs that can be stored in Azure Storage: block and page blobs. Most files are block blobs. A single block blob can be up to 200 GB in size. Page blobs, another blob type, can be up to 1 TB in size, and are more efficient when ranges of bytes in a file are modified frequently.
·         URL format: Blobs are addressable using the following URL format:
http://<storage account>.blob.core.windows.net/<container>/<blob>
The following example URL could be used to address one of the blobs.            

Estimate Capacity in Blob Storage

1. How to estimate the amount of storage consumed per blob container:
48 bytes + Len(ContainerName) * 2 bytes +
For-Each Metadata[3 bytes + Len(MetadataName) + Len(Value)] +
For-Each Signed Identifier[512 bytes]
The following is the breakdown:
·         48 bytes of overhead for each container includes the Last Modified Time, Permissions, Public Settings, and some system metadata.
·         The container name is stored as Unicode so take the number of characters and multiply by 2.
·         For each blob container metadata stored, we store the length of the name (stored as ASCII), plus the length of the string value.
·         The 512 bytes per Signed Identifier includes signed identifier name, start time, expiry time and permissions.

      2. How to estimate the amount of storage consumed per blob:
For Block Blob (base blob or snapshot) we have: 124 bytes + Len(BlobName) * 2 bytes + 
For-Each Metadata[3 bytes + Len(MetadataName) + Len(Value)] + 
8 bytes + number of committed and uncommitted blocks * Block ID Size in bytes + 
SizeInBytes(data in unique committed data blocks stored) + 
SizeInBytes(data in uncommitted data blocks)
For Page Blob (base blob or snapshot) we have: 124 bytes + Len(BlobName) * 2 bytes + 
For-Each Metadata[3 bytes + Len(MetadataName) + Len(Value)] + 
number of nonconsecutive page ranges with data * 12 bytes + 
SizeInBytes(data in unique pages stored)

     3. Page Blob and block Blob  can be divided

·         124 bytes of overhead for blob, which includes the Last Modified Time, Size, Cache-Control, Content-Type, Content-Language, Content-Encoding, Content-MD5, Permissions, Snapshot information, Lease, and some system metadata.
·         The blob name is stored as Unicode so take the number of characters and multiple by 2.
·         Then for each metadata stored, the length of the name (stored as ASCII), plus the length of the string value.
·         Then for Block Blobs
ü  8 bytes for the block list
ü  Number of blocks times the block ID size in bytes
ü  Plus the size of the data in all of the committed and uncommitted blocks. Note, when snapshots are used, this size only includes the unique data for this base or snapshot blob. If the uncommitted blocks are not used after a week, they will be garbage collected, and then at that time they will no longer count towards billing after that.
·         Then for Page Blobs
ü  Number of nonconsecutive page ranges with data times 12 bytes. This is the number of unique page ranges you see when calling the GetPageRanges API.
ü  Plus the size of the data in bytes of all of the stored pages. Note, when snapshots are used, this size only includes the unique pages for the base blob or snapshot blob being counted.

Benefits of Blob storage

  • Blobs are ordered by name. Using naming conventions you can easily bump blobs to the top of the list
  • You can browse and read all blobs at anytime
  • You can add and delete any blob at any time without having to purge the container
  • Blob storage offers a cost-effective and scalable solution for large amount of data


There are several benefits associated with storing the data in Azure Blob storage

  • Data reuse and sharing: The data in Azure Blob storage can be accessed either through the HDFS APIs or through the Blob Storage REST APIs. Thus, a larger set of applications and tools can be used to produce and consume the data.
  • Data archiving: Storing data in Azure Blob storage enables the HDInsight clusters used for computation to be safely deleted without losing user data.
  • Data storage cost: Storing data in DFS for the long term is more costly than storing the data in Azure Blob storage because the cost of a compute cluster is higher than the cost of an Azure Blob storage container. In addition, because the data does not have to be reloaded for every compute cluster generation, you are also saving data loading costs.
  • Elastic scale-out: Changing the scale can become a more complicated process than relying on the elastic scaling capabilities that you get automatically in Azure Blob storage.
  •  Geo-replication: Your Azure Blob storage containers can be georeplicated through the Azure portal. Although this gives you geographic recovery and data redundancy, a failover to the georeplicated location severely impacts your performance, and it may incur additional costs. So our recommendation is to choose the georeplication wisely and only if the value of the data is worth the additional cost.




1 comment: