Category Archives: Big Data

Data Architecture: Components, Tools, and Processes

What is data architecture? Data is everywhere in an organization, from large systems to departmental databases and spreadsheets. No one can control all of it since it spans across a wide range. If the data is not clean, current, and consistent, an organization may get into trouble. This makes data architecture important. But what really […]

Configure Azure Key Vault & Storage Account URLs in ADF

Did you know that it is possible to pass an environment-specific Azure Key Vault URL value and Storage Account base URL dynamically using a single global parameter in Azure Data Factory? Allow me to explain. The problem was that in ADF pipelines, a static Key Vault URL and a Storage Account URL were not a […]

Diving into The Ocean of Big Data & Analytics

Today, the universe of data is continuously expanding at a speed that is much more than expected. At the start of 2020, the amount of data generated was estimated to be a little over 44 zettabytes! I’m sure it has well surpassed these numbers today. This data is immensely complex and exists in the form […]

5 Top-Tier Big Data Engineering Services

Store, organize and process your data efficiently In today’s age with numerous companies opting for digital transformation are producing unimaginable volumes of new types of data. To pursue their journey towards digitalization, they deployed costly enterprise data warehouses along with data marts to store, process, and analyse it. This certainly brought them some success, but, […]

Your All-In-One Guide to Ensuring MariaDB High Availability & Failover

When an application connected to a primary server grows over time, making relevant scaling a necessity since the primary node no longer remains viable. Also, if the node has any issues such as hardware malfunctioning, restoring data from the backup becomes a hassle. Ensuring high availability and automated failover is the best way to overcome […]

Setting up Dev Endpoint using Apache Zeppelin with AWS Glue

AWS Glue is a powerful tool that is managed, relieving you of the hassle associated with maintaining the infrastructure. It is hosted by AWS and offers Glue as Serverless ETL, which converts the code into Python/Scala and execute it in Spark environment. AWS Glue provisions all the required resources (Spark cluster) at runtime to execute […]


In this blog, we will focus on understanding the process of using AWS Redshift PartiQL and how it can be used to analyze data in its native format. But before we move on to that, let us first define the problem statement. Data is typically spread across a combination of relational databases, non-relational data stores, […]

Data Processing with Apache Spark

Spark has emerged as a favorite for analytics, especially those that can handle massive volumes of data as well as provide high performance compared to any other conventional database engines. Spark SQL allows users to formulate their complex business requirements to Spark by using the familiar language of SQL. So, in this blog, we will […]

Real Time Analytics Using Spark With Cosmos DB Connector

How can you integrate Spark & Cosmos DB? This blog helps you understand how Spark and Cosmos DB can be integrated allowing Spark to fully take advantage of Cosmos DB to run real-time analytics directly on petabytes of operational data! High-Level Architecture With the Spark Connector for Azure Cosmos DB, data is run in parallel […]

MongoDB to Redshift- Data Migration

We will cover various approaches used to perform data migration from MongoDB to Redshift in this article. A Brief Overview of MongoDB and Redshift MongoDB is an open source NoSQL database which stores data in JSON format using a document-oriented data model. Data fields can vary by document. MongoDB isn’t associated with any specific data […]