e.preventDefault(); // Prevent form submission input.blur(); // remove focus input.value = ""; // optional: clear the input

Data processing on Google Cloud Platform

June 3, 2020

Data processing options available on GCP is part of its broader data analytics.
Here are high-level components of the pipeline:

Data_Processing_On_GCP_1 — Agile Data Options in GCP

The initial step is to get the data into the platform from various disparate sources, generally referred to as Data Ingestion. After we receive data into the data pipeline, data is made useful for analytical purposes. This processed data stored in a modern data warehouse, or data lake, and then consumed by enterprise users for operational and analytical reporting, machine learning and advanced analytics or AI use cases.

Data Ingestion

Cloud Pub/Sub:

To improve reliability data ingestion and data movement mechanism follows Publish/Subscribe(Pub/Sub) pattern. You are able to scalable up to 100GB/sec with consistency and this is enough to satisfy the scale of almost any enterprise. The data is retained for 7 days by default although it can be retained several days. This service is deeply integrated with other components of the GCP analytics platform.

Data Processing

Cloud Dataflow:

Cloud Dataflow is used for streaming data processing in real-time. The traditional approach to building data pipelines was to create a separate codebase (Lambda architecture patterns) for batch, micro-batch, and stream processes. Cloud Data can create a unified programming model so that the users can process the workloads with the same code base. It also simplifies operations and management.

Dataproc:

It is a fully managed Apache Hadoop and Spark service, which allows the user to use all familiar open-source Hadoop tools like Spark, Hadoop, Hive, Tez, Presto, Jupyter, etc. and then also to tightly integrate it to the services within a Google Cloud Platform ecosystem. This is because it gives flexibility to rapidly define clusters. It also defines machine types to be used for master and data nodes.

There are majorly two types of Dataproc clusters that can be provisioned in Google Cloud Platform. The first one is known as the ephemeral cluster (cluster is defined when a job is submitted, scaled up or down as needed by the job, and is deleted once the job is completed). The second one is the long-standing cluster, where the user creates a cluster (comparable to an on-premise cluster) with a defined number of the minimal and maximum number of nodes. Here, the jobs will be executed within the constraints, and when the jobs are completed, the cluster scales down to the minimum constraint. Depending on the use case and processing power needed, this gives the flexibility to define the type of clusters.

Dataproc is an enterprise-ready service with high availability and high scalability. It allows both horizontal scaling (scales to the tune of 1000s of nodes per cluster) as well as vertical scaling (configurable computer machine types, GPUs, Solid-state drive storages, and persistent disks).

Latest Posts

All Posts
Generative AI
manufacturing
News
Portfolio

Back
Android
iOS
Java
PHP
MEAN
Ruby
DotNet
IoT
Cloud
Testing
Roku
CMS
Python

The Generative AI Revolution: Reshaping Our World

July 11, 2025

Generative AI (GenAI) has rapidly become one of the most talked-about and transformative technologies of our time, and for good...

AI in OTT: Revolutionizing Entertainment with Smart Technology

July 7, 2025

The Over-The-Top (OTT) streaming industry has witnessed an unprecedented surge in recent years, becoming the primary mode of content consumption...

What Makes AI Crucial in Fraud Detection in FinTech?

June 26, 2025

The finance technology (FinTech) is going through a tremendous transformation. This enables unprecedented convenience and accessibility to financial services. Whether...

Services

Products

Resources

About Tudip

Careers

Contact

India

Plot No. 11/2, Phase 3, Hinjewadi Rajiv Gandhi Infotech Park, Pune, India – 411057.
info@tudip.com
+91-96-8990-0537

United States

1999 S. Bascom Ave Suite 700, Campbell CA. 95008, USA.
info@tudip.com
+1-408-216-8162

Canada

64 Caracas Road North York, Toronto Ontario M2K 1B1, Canada.
info@tudip.com

Mexico

Calle Amado Nervo #785 Interior B Colonia Ladron De Guevara 44600 Guadalajara, Jalisco, Mexico.
info@tudip.com

Colombia

Cra. 9 # 113-53 Of. 1405 Bogotá D.C., Colombia.
info@tudip.com

UAE

Tudip Information Technologies L.L.C Office No 109, ABU HAIL BUILDING 13, Abu Hail, Dubai, UAE.
info@tudip.com

Nigeria

22 Kumasi Crescent, Wuse 2, Abuja, Nigeria.
info@tudip.com

Contact

India

Plot No. 11/2, Phase 3, Hinjewadi Rajiv Gandhi Infotech Park, Pune, India – 411057.
info@tudip.com
+91-96-8990-0537

United States

1999 S. Bascom Ave Suite 700, Campbell CA. 95008, USA.
info@tudip.com
+1-408-216-8162

Canada

64 Caracas Road North York, Toronto Ontario M2K 1B1, Canada.
info@tudip.com

Mexico

Calle Amado Nervo #785 Interior B Colonia Ladron De Guevara 44600 Guadalajara, Jalisco, Mexico.
info@tudip.com

Colombia

Cra. 9 # 113-53 Of. 1405 Bogotá D.C., Colombia.
info@tudip.com

UAE

Tudip Information Technologies L.L.C Office No 109, ABU HAIL BUILDING 13, Abu Hail, Dubai, UAE.
info@tudip.com

Nigeria

22 Kumasi Crescent, Wuse 2, Abuja, Nigeria.
info@tudip.com

Shell (MachineMax)

AI in Healthcare: Responsible ...

Others

Data processing on Google Cloud Platform

Data Ingestion

Cloud Pub/Sub:

Data Processing

Cloud Dataflow:

Dataproc:

Related Posts

Latest Posts

Resources

India

United States

Canada

Mexico

Colombia

UAE

Nigeria

India

United States

Canada

Mexico

Colombia

UAE

Nigeria