Introduction to Google Cloud Datastore
Introduction to Google Cloud Datastore
Why invest in the physical drive when you can get more virtual storage for much less. Yes, you heard it right. The solution is Google’s cloud storage, your “on the go” backup and storage solution from the Information technology giant Google, a subsidiary of Alphabet.inc
Google Cloud Datastore is a unified object storage facility that provides many of the Google Cloud Platform (GCP) users the ease of storing and moving data as needed. Today, let’s talk out some of the features and pros of the storage service provided by Google and the future implementation of the same.
Google cloud storage provides the following pros to its users against the traditional physical storage systems.
Availability of the data
In the era of information, data availability plays a vital role in the continuous process of operations and hence infrastructures are being developed in order to provide the data to the processes without any kind of failure.
For the purpose of data availability, cloud storage utilizes the method of dual regional storing options in which the data is stored simultaneously as primary and backups. In case the data hit fails, the backup data is provided to the nearest requester and the data processing as well as the availability is met. This enables the services to avail the data as much as 99% to 99.9% reliability.
Atomic transactions
Cloud Datastore does not support partial operations on data.it. Either it completes a transaction soulfully or on failure, rollbacks the data set to its previous state. This is referred to as the atomicity of the data.
Enhanced scalability to provide high performance
The processes are optimized for the distributed architecture of data storing which automatically manages the requests and optimizes the results as per the size of data returns and not of the entire size of the datasets.
Flexible storage and querying of data
Cloud data store architecture manages to map the data to the object-oriented and the scripting languages which improvises the query operations to multiple clients by being independent of each other.
The balance of strong and eventual consistency
Cloud data store look up the entities with the corresponding keys which ensure the data is accurate and consistent throughout the operations and enables the application to deliver great user experience.
Data Protection
Data is encrypted at the point of storing in the disk and decrypted at the point of retrieval from the disk and hence the data is kept protected from the miscellaneous activities which may lead to data leaks.
No backup or down plans required
Since the cloud infrastructures take care of the data, developers can utilize their valuable time on developing great user experiences rather than worrying about the backup plans.
The infrastructure enables the users to utilize the data even during the upgrades and the failures.
Future is cloud!!
Nowadays, developers are more concerned about the ownership of the infrastructure and the cost of maintenance, and this gives the cloud platform an upper edge to their choice.
The cloud platform is robust, solid and low cost.
Data stored in the cloud has high availability and secure which encourages the user to focus more on the experience and less on disaster management.
Cloud Firestore is the newest version of the cloud data store. It is a NoSQL document base database built for automatic scaling, high performance and ease of application development. The cloud datastore has the same features as that of traditional databases. Here, the category of objects is kind, one object as an entity, property for individual data of an object and unique ID of an object as a key.
Cloud data store differs from the relational database in the following aspects:
- Automatic Scaling:
The cloud data store allows applications to maintain high performance as they receive more traffic by automatically scaling to large data sets and distributing data as necessary. - Restricted Queries:
All the queries are based on previously built indexes. The more restricted queries are allowed here than those allowed on the relational database. It doesn’t support for join operations, inequality filtering of multiple properties - Schema-Less:
It doesn’t require entities to have the same set of properties.
This is ideal for applications that rely on highly available data at a scale. It provides real-time inventory and customized experience based on users past activities. These transactions are based on acid properties.
Transactions
It is a set of data storage operations on one or more entities. These transactions are never partially applied, i.e. either all of the transactions are applied, or none of them is applied. The transactions have a maximum duration of 60 seconds with 10 seconds of idle expiration time after 30 seconds.
The following results which may cause the failure of the operations when:
- Too many concurrent modifications are attempted on the same entity group.
- The transactions exceed a resource limit.
- Encounters an internal error.
These transactions are applied at a maximum of 25 entity groups which includes querying for the ancestor, retrieving entities by keys, updating entities and retrieving entities.
If there are two or more transactions simultaneously attempt to modify entities, only the first transaction to commit its changes can succeed and all the others will fail to commit. Using the entity groups limits the number of concurrent writes on any entity groups. Upon committing a transaction for the entity groups, cloud data store checks for the last update time for the entity groups used in the transaction.
Isolation and Consistency
Serializable isolation is enforced for the cloud data store at the time of the transaction, i.e another transaction concurrently cannot modify the data that is read or modified by this transaction. Entities and indexes in the entity group are fully updated so that the entities return the complete set of result entities.
The transactions are mainly used for updating entities with a new property value relative to its current value. The cloud data store API does not retry the transactions if a failure occurs. Cloud data store can be used to read a consistent snapshot of the data store. This can be useful when multiple reads are needed to render a page or export data that must be consistent. Read-only single group modifications never fail due to concurrent modifications, so that you don’t require to implement retry upon the failure of the transaction.
Entity groups
An entity group is a set of entities connected to a common root element. These entity group limits which the transactions to be performed. The following are the rules to be followed:
- A transaction can contain at most 25 entity groups.
- The data must be organized into entity groups when using queries within the transaction.
- There is a limit of one transaction per second within a single entity group. This is because it performs masterless, synchronous replication of each entity group over a wide geographical area to provide high reliability and fault tolerance.
Consistency levels
There are two consistency levels in which data store queries can deliver the results. They are as follows:
- Strong consistent:
This guarantees the most up-to-date results, but may take longer to complete or may not be supported in certain classes. - Eventually consistent:
These queries run faster, but may occasionally return stale results.
Consistency considerations
A transaction may include any number of creating, deleting, updating mutations. To maintain consistency of the data, the transaction ensures that all of the mutations it consists of are applied as a unit or, if any mutation fails, none of them are applied. All strongly consistent reads performed within the same transaction rely on a single, consistent snapshot of data. Strong consistency queries must specify an ancestor filter. Queries that participate in a transaction are always consistent. Eventually, consistent transactions may allow you to distribute the data among a large number of entity groups. The results from these reads may not reflect the latest transactions.
To create a strong consistency, a better approach is to create entities with ancestor paths. The ancestor path identifies the common root entity in which the created entities are grouped. Writing a single entity per task list achieves strong consistency, it limits the changes to task list to not more than one write per second.
Limitation of the Cloud Datastore
- Cloud storage is not an effective solution for analytical data.
- It doesn’t support Online Transaction processing(OLTP).
- It doesn’t support highly unstructured data.
- It doesn’t support interactive querying in an Online Analytical Processing system.
- It is not supported for immutable blobs.
Some of the best practices to be followed while building an application
- UTF-8 characters are to be used for namespaces, property names and custom key names.
- Use batch transactions instead of single operations. Batch operations are more efficient than single operations as they multiple operations with the same overhead.
- If any transaction fails, ensure rollout the transaction.
- Use asynchronous calls instead of synchronous calls as it minimizes the latency impact.
- Group highly related data in entity groups as it enables ancestor queries.
- Do not include the same entity multiple times in the same commit.
- Avoid writing an entity group more than once per second.
- Do not use a negative number or zero for the numerical ID.
- Exclude the property from the query if it is not needed.
- Do not use dots in property names instead use UTF-8 characters for string format.
- Use an ancestor query if strong consistency is needed.
- Avoid high read/write rates to the Cloud Datastore keys that are lexicographically close.
- Avoid deleting a large number of Cloud Datastore entities across a small range of keys.