Serverless Data Analysis allows organizations to carry out no-ops data warehousing using BigQuery, and pipeline processing using Cloud Dataflow.
Google BigQuery is a petabyte scale data warehouse on Google Cloud that you interact primarily through SQL, and Cloud Dataflow is a data processing pipeline system that you can program against in either Python or Java.
Serverless Data Analysis is meant for people who build data pipelines and data analytics. It is imperative that an individual working with these tools have a solid understanding of SQL, because of interactions with BigQuery, and know either Python or Java, to work with Dataflow.
BigQuery is Google’s no-ops solution to data warehousing and analytics systems, and by no-ops in this context essentially means that there is no infrastructure for you to manage, so no operations. It allows an individual to store data, analyze the data and export the data from a centralized location.
Dataflow is a way by which you can execute Apache Beam data processing pipelines on the cloud. It does so in a series of steps, and the key thing about Dataflow is that these steps called transforms can be elastically scaled. The code that is written is in an open source API called Apache Beam, and Dataflow is not the only place that you can execute Apache Beam pipelines, you can execute them on Flink or Spark etc, but Cloud Dataflow is usually used as the execution service for when we have a data pipeline that we would like to execute on the cloud.
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.