Tuesday, October 13, 2015

Microsoft Azure: Microsoft Azure Table Storage:Benefits and Estimate table Capacity:

Table Storage

  • The Azure Table storage service stores large amounts of structured data.
  •  The service is a NoSQL data store which accepts authenticated calls from inside and outside the Azure cloud.
  • Azure tables are ideal for storing structured, non-relational data.
  • Common uses of the Table service include:
ü  Storing TBs of structured data capable of serving web scale applications
ü  Storing datasets that don't require complex joins, foreign keys, or stored procedures and can be de-normalized for fast access
ü  Quickly querying data using a clustered index
ü  Accessing data using the OData protocol and LINQ queries with WCF Data Service .NET Libraries
ü  We can use the Table service to store and query huge sets of structured, non-relational data, and your tables will scale as demand increases.

Table Service Concepts

The Table service contains the following components

·       
  •         URL format: Code addresses tables in an account using this address format:
  •             http://<storage account>.table.core.windows.net/<table>                                               We can address Azure tables directly using this address with the OData protocol.
  •         Storage Account: All access to Azure Storage is done through a storage account.
  •         Table: A table is a collection of entities. Tables don't enforce a schema on entities, which means a single table can contain entities that have different sets of properties. The number of tables that a storage account can contain is limited only by the storage account capacity limit.
  •          Entity: An entity is a set of properties, similar to a database row. An entity can be up to 1MB in size.
  •          Properties: A property is a name-value pair. Each entity can include up to 252 properties to store data. Each entity also has 3 system properties that specify a partition key, a row key, and a timestamp. Entities with the same partition key can be queried more quickly, and inserted/updated in atomic operations. An entity's row key is its unique identifier within a partition.

Benefits of Azure Table Storage

1.     Table storage offers highly available, massively scalable storage, so that your application can automatically scale to meet user demand.

2.     Table storage is Microsoft’s NoSQL key/attribute store – it has a schema less design, making it different from traditional relational databases.

3.   Using Azure Table Storage, Single Primary key lookup is required for semantics feature

4.     Automatic mass scale by partition and consistent performance even at large scale.

5.     As per user experience direct serialization; no ORM necessary; simplified design model by removing relational model.

6.     The Storage table Type variability in a single table.

7.     As per cost, No space overhead cost, pay for what is used.

Estimate Capacity in Table Storage

1. How to estimate the amount of storage consumed per Table:

The following is the breakdown:
      12 bytes overhead for each Table, which includes the Last Modified Time and some system metadata.
    The table name is stored as Unicode so take the number of characters and multiple by 2.

2 How to estimate the amount of storage consumed per entity:

4 bytes + Len (PartitionKey + RowKey) * 2 bytes +
For-Each Property (8 bytes + Len(Property Name) * 2 bytes + Sizeof(.Net Property Type))
  •  4 bytes overhead for each entity, which includes the Timestamp, along with some system metadata..
  •   The number of characters in the PartitionKey and RowKey values, which are stored as Unicode (times 2 bytes).
  •  Then for each property we have an 8 byte overhead, plus the name of the property * 2 bytes, plus the size of the property type as derived from the list below.

  
Limitation of Table Storage

Table names must conform to these rules:

ü  Table names must be unique within an account.
ü  Table names may contain only alphanumeric characters.
ü  Table names cannot begin with a numeric character.
ü  Table names are case-insensitive.
ü  Table names must be from 3 to 63 characters long.
ü  Some table names are reserved, including "tables". Attempting to create a table with a reserved table name returns error code 404 (Bad Request).

                   Max up to  500 TB size of a single table
                   Only 1 MB per entity can store
                   As per Table Transaction

ü  Table single entity AddObject request = 1 transaction
ü  Table Save Changes (without SaveChangesOptions.Batch) with 100 entities = 100 transactions
ü  Table Save Changes (with SaveChangesOptions.Batch) with 100 entities = 1 transaction
ü  Table Query specifying an exact PartitionKey and RowKey match (getting a single entity) = 1 transaction
ü  Table query doing a single storage request to return 500 entities (with no continuation tokens encountered) = 1 transaction
ü  Table query resulting in 5 requests to table storage (due to 4 continuation tokens) = 5 transactions













Microsoft Azure: Microsoft Azure Blob Storage:Benefits and Estimate blob Capacity:

Azure Storage

  • Microsoft Azure Storage is a massively scalable, highly available, and elastic cloud storage solution that empowers developers and IT professionals to build large-scale modern applications.
  •  Azure Storage is accessible from anywhere in the world, from any type of application, whether it’s running in the cloud, on the desktop, on an on-premises server, or on a mobile or tablet device.
  •  Azure Storage uses an auto-partitioning system that automatically load-balances your data based on traffic.
  •  Azure Storage also exposes data resources via simple REST APIs, which are available to any client capable of sending and receiving data via HTTP/HTTPS.


Azure Storage Services

The Azure Storage services are Blob storage, Table Storage, Queue Storage, and File Storage:

·         Blob Storage: stores file data. A blob can be any type of text or binary data, such as a document, media file, or application installer.

·         Table Storage: stores structured datasets. Table storage is a NoSQL key-attribute data store, which allows for rapid development and fast access to large quantities of data.

·         Queue Storage: provides reliable messaging for workflow processing and for communication between components of cloud services.


·         File Storage (Preview): offers shared storage for legacy applications using the standard SMB 2.1 protocol. Azure virtual machines and cloud services can share file data across application components via mounted shares, and on premise applications can access file data in a share via the File service REST API. File storage is available by request via the Azure Preview page.

Blob Storage
  •         Azure Blob storage is a service for storing large amounts of unstructured data, such as text or binary       data, that can be accessed from anywhere in the world via HTTP or HTTPS.
  •         We can use Blob storage to expose data publicly to the world, or to store application data privately.
  •         Common uses of Blob storage include: 
ü  Serving images or documents directly to a browser
ü  Storing files for distributed access
ü  Streaming video and audio
ü  Performing secure backup and disaster recovery
ü  Storing data for analysis by an on-premises or Azure-hosted service.

Blob Service Concepts

The Blob service contains the following components:

·         Storage Account: All access to Azure Storage is done through a storage account.
·         Container: A container provides a grouping of a set of blobs. All blobs must be in a container. An account can contain an unlimited number of containers. A container can store an unlimited number of blobs.
·         Blob: A file of any type and size. There are two types of blobs that can be stored in Azure Storage: block and page blobs. Most files are block blobs. A single block blob can be up to 200 GB in size. Page blobs, another blob type, can be up to 1 TB in size, and are more efficient when ranges of bytes in a file are modified frequently.
·         URL format: Blobs are addressable using the following URL format:
http://<storage account>.blob.core.windows.net/<container>/<blob>
The following example URL could be used to address one of the blobs.            

Estimate Capacity in Blob Storage

1. How to estimate the amount of storage consumed per blob container:
48 bytes + Len(ContainerName) * 2 bytes +
For-Each Metadata[3 bytes + Len(MetadataName) + Len(Value)] +
For-Each Signed Identifier[512 bytes]
The following is the breakdown:
·         48 bytes of overhead for each container includes the Last Modified Time, Permissions, Public Settings, and some system metadata.
·         The container name is stored as Unicode so take the number of characters and multiply by 2.
·         For each blob container metadata stored, we store the length of the name (stored as ASCII), plus the length of the string value.
·         The 512 bytes per Signed Identifier includes signed identifier name, start time, expiry time and permissions.

      2. How to estimate the amount of storage consumed per blob:
For Block Blob (base blob or snapshot) we have: 124 bytes + Len(BlobName) * 2 bytes + 
For-Each Metadata[3 bytes + Len(MetadataName) + Len(Value)] + 
8 bytes + number of committed and uncommitted blocks * Block ID Size in bytes + 
SizeInBytes(data in unique committed data blocks stored) + 
SizeInBytes(data in uncommitted data blocks)
For Page Blob (base blob or snapshot) we have: 124 bytes + Len(BlobName) * 2 bytes + 
For-Each Metadata[3 bytes + Len(MetadataName) + Len(Value)] + 
number of nonconsecutive page ranges with data * 12 bytes + 
SizeInBytes(data in unique pages stored)

     3. Page Blob and block Blob  can be divided

·         124 bytes of overhead for blob, which includes the Last Modified Time, Size, Cache-Control, Content-Type, Content-Language, Content-Encoding, Content-MD5, Permissions, Snapshot information, Lease, and some system metadata.
·         The blob name is stored as Unicode so take the number of characters and multiple by 2.
·         Then for each metadata stored, the length of the name (stored as ASCII), plus the length of the string value.
·         Then for Block Blobs
ü  8 bytes for the block list
ü  Number of blocks times the block ID size in bytes
ü  Plus the size of the data in all of the committed and uncommitted blocks. Note, when snapshots are used, this size only includes the unique data for this base or snapshot blob. If the uncommitted blocks are not used after a week, they will be garbage collected, and then at that time they will no longer count towards billing after that.
·         Then for Page Blobs
ü  Number of nonconsecutive page ranges with data times 12 bytes. This is the number of unique page ranges you see when calling the GetPageRanges API.
ü  Plus the size of the data in bytes of all of the stored pages. Note, when snapshots are used, this size only includes the unique pages for the base blob or snapshot blob being counted.

Benefits of Blob storage

  • Blobs are ordered by name. Using naming conventions you can easily bump blobs to the top of the list
  • You can browse and read all blobs at anytime
  • You can add and delete any blob at any time without having to purge the container
  • Blob storage offers a cost-effective and scalable solution for large amount of data


There are several benefits associated with storing the data in Azure Blob storage

  • Data reuse and sharing: The data in Azure Blob storage can be accessed either through the HDFS APIs or through the Blob Storage REST APIs. Thus, a larger set of applications and tools can be used to produce and consume the data.
  • Data archiving: Storing data in Azure Blob storage enables the HDInsight clusters used for computation to be safely deleted without losing user data.
  • Data storage cost: Storing data in DFS for the long term is more costly than storing the data in Azure Blob storage because the cost of a compute cluster is higher than the cost of an Azure Blob storage container. In addition, because the data does not have to be reloaded for every compute cluster generation, you are also saving data loading costs.
  • Elastic scale-out: Changing the scale can become a more complicated process than relying on the elastic scaling capabilities that you get automatically in Azure Blob storage.
  •  Geo-replication: Your Azure Blob storage containers can be georeplicated through the Azure portal. Although this gives you geographic recovery and data redundancy, a failover to the georeplicated location severely impacts your performance, and it may incur additional costs. So our recommendation is to choose the georeplication wisely and only if the value of the data is worth the additional cost.