Google Cloud Data Loss Prevention (DLP) for Developers

Google Cloud Data Loss Prevention (DLP) for Developers



10 June 2020

Every organization has sensitive data they must protect: from addresses and credit card numbers to medical patient records and highly secure data, the list goes on. These types of info are typically referred to as personally identifiable information (PII). For businesses with this data, securing the network is just half the work done. You need some way of protecting the data in such a way that it’s unidentifiable to unauthorized eyes.

That’s when DLP comes into picture. DLP solutions are growing fastly as enterprises look for ways to reduce the risk of sensitive data leakage outside the company. It helps you better understand and manage sensitive data.

Let’s understand Data Loss Prevention (DLP) basically means and what it does.

What is DLP?

Data loss prevention can be defined as technology which performs content inspection and contextual analysis of data sent by messaging applications such as email and, in motion over the network, in use on a managed endpoint device or in cloud applications and cloud storage.

It provides us fast, scalable classification and redaction of sensitive data parameters like credit card numbers, selected international identifier numbers, phone numbers, and credentials, etc. DLP gives the power to scan, classify, and report on data from virtually anywhere.

Cloud DLP includes:

  • More than 120 built-in information type detectors.
  • De-identification techniques include redaction, masking, format-preserving encryption, date-shifting, and many more.
  • Can detect sensitive data within streams of data, files in storage such as Cloud Storage bucket, BigQuery, and within images.
  • The ability to define custom infoType detectors using dictionaries, regular expressions, and contextual elements as well.

Let’s take a closer look.

  • Inspection – Cloud Data Loss Prevention detects and classifies sensitive data within text content or structured content such as CSV or JSON. You can also classify sensitive data within images.
  • Info Type Detectors – Cloud Data Loss Prevention (DLP) supports built-in and custom infoTypes as well. You can also use the Cloud DLP API to list all built-in infoTypes programmatically.  Each infoType includes infoType identifier (the internal name of the infoType) and infoType display name (a human readable infoType name).
  • Redaction – Cloud Data Loss Prevention (DLP) can redact or obfuscate sensitive data from a string of text. You can pass textual information to the API using JSON over HTTP, or can use one of the client libraries to do so using several popular programming languages. You can redact sensitive data from text or from images as well.
  • De-identification – It is the process of removing identifying information from data. De-identification techniques can include any of the following:
    • Masking sensitive data by partially or fully replacing characters with symbols, such as an asterisk ‘*’ or hash ‘#’.
    • Replacing each instance of sensitive data with a “token,” or string.
    • Encrypting and replacing sensitive data using a randomly generated key.
  • Scheduling Inspection Jobs – When DLP performs an inspection scan to identify sensitive data, each scan runs as a job. Cloud DLP creates and runs a job resource whenever you tell it to inspect your Google Cloud storage repositories, including Cloud Storage buckets, BigQuery tables, or Datastore kinds. You have to schedule inspection scan jobs by creating job triggers.

For more details about Google Cloud DLP refer

Blog Categories
Request a quote