Unlock Advanced Data Visualization: The Complete Guide to Installing and Using Apache Superset on Linux

Data Visualization - Apache Superset Guide. Image Source: Unsplash

Data Visualization - Apache Superset Guide. Image Source: Unsplash 

Note: This article provides a comprehensive guide on deploying and using Apache Superset on a Linux server. It covers the installation and configuration process, as well as the benefits and features of Superset. While the primary focus is on Superset, we will also explore the broader concepts of business intelligence, data analytics, and visualization.

Introduction to Business Intelligence

In today's data-driven world, businesses are constantly seeking ways to gain insights and make informed decisions. This is where Business Intelligence (BI) comes into play. BI refers to the process of collecting, analyzing, and presenting data to help organizations understand their performance, identify trends, and make data-driven decisions. By leveraging BI tools and techniques, businesses can unlock the power of data analytics and visualization to drive growth and gain a competitive edge.

The Role of Data Analytics in Business Intelligence

Data analytics is at the core of business intelligence. It involves the exploration, interpretation, and analysis of data to uncover meaningful patterns and insights. With the help of advanced algorithms and statistical models, businesses can extract valuable information from large datasets and gain a deeper understanding of their operations, customers, and market trends.

Data analytics can be classified into four main types:

Descriptive Analytics: Describes what has happened in the past by summarizing historical data. It provides a snapshot of the current state and helps businesses understand trends and patterns.

Diagnostic Analytics: Focuses on identifying the causes of past events and understanding why they occurred. It helps businesses uncover the root causes behind certain outcomes or trends.

Predictive Analytics: Uses historical data and statistical models to predict future outcomes and trends. It enables businesses to make proactive decisions and anticipate potential challenges or opportunities.

Prescriptive Analytics: Goes beyond predictions and provides recommendations on the best course of action. It uses optimization techniques and simulation models to guide decision-making.


By leveraging data analytics, businesses can gain valuable insights into their operations, customer behaviour, market trends, and more. This information can then be used to drive strategic decision-making and optimize business processes.

The Power of Data Visualization in Business Intelligence

While data analytics provides the foundation for business intelligence, data visualization is the key to unlocking its true potential. Data visualization refers to the graphical representation of data, allowing users to understand complex information at a glance. By presenting data in a visual format, businesses can communicate insights effectively, spot trends, and identify patterns that might have otherwise gone unnoticed.

Data visualization offers several benefits:

Improved Data Understanding: Visualizing data makes it easier for users to grasp complex information quickly. By presenting data in a visual format, businesses can enhance data comprehension and promote better decision-making.

Enhanced Insights: Visualization helps users identify trends, patterns, and outliers in data. By visually exploring data, businesses can uncover hidden insights and make data-driven decisions.

Increased Engagement: Visual representations of data are more engaging and memorable compared to raw numbers or text. By using interactive charts, graphs, and dashboards, businesses can captivate their audience and ensure the message is conveyed effectively.

Efficient Communication: Data visualization simplifies the communication of complex ideas and findings. By presenting data visually, businesses can convey information to stakeholders in a clear and concise manner, fostering collaboration and alignment.

Apache Superset: An Introduction

Apache Superset is a modern data exploration and visualization platform that allows you to create interactive dashboards and charts with a code-free interface. It supports a wide range of data sources, including relational databases, big data platforms, and cloud-based storage solutions.

Installing Apache Superset on Linux:

Before diving into the installation process, it's essential to ensure that the required dependencies are in place. The installation process can vary depending on the operating system. Let's explore the installation steps for different environments.

Debian and Ubuntu

For Debian and Ubuntu Linux distributions, the following command installs the necessary dependencies:

sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev

Fedora and RHEL-derivative Linux distributions

For Fedora and RHEL-derivative Linux distributions, use the yum package manager to install the required packages:

sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel

In more recent versions of CentOS and Fedora, you may need to use dnf instead of yum:

sudo dnf install gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel

On CentOS, you may also need to upgrade pip:

pip3 install --upgrade pip

Creating a Virtual Environment

To ensure a clean and isolated installation, it's highly recommended to install Superset inside a virtual environment. Python provides virtualenv out of the box, or you can use pyenv and pyenv-virtualenv. Here's how to create and activate a virtual environment:

Using virtualenv

pip install virtualenv
virtualenv superset
source superset/bin/activate

Once the virtual environment is activated, all Python packages installed or uninstalled will be confined to this environment. You can exit the environment by running deactivate on the command line.

Installing Apache Superset

To install Apache Superset, use the following command: 

pip install apache-superset

After installation, the next step is to initialize the database. However, before running the database upgrade, it is important to configure the user-specified value of SECRET KEY.

Setting up Apache Superset Configuration file

Create a directory called superset or any name you prefer:

mkdir superset

Create a Python file called superset_config.py and add the file path to the PYTHONPATH environment variable.

export PYTHONPATH="${PYTHONPATH}:/visualization/superset"

To set it permanently append the above line in the shell configuration file (~/.bashrc) and reload the profile for changes to effect.

After adding this line, you can verify that the PYTHONPATH environment variable has been set correctly by running the command


Let's add the below file line to configure the SECRET KEY variable.


After SECRET KEY configuration, the next step is to initialize the database:

superset db upgrade

Once the database is initialized and the configuration is set, you can proceed to create an admin user, load example data, create default roles and permissions, and build JavaScript assets. These steps ensure a smooth installation and set up the necessary components for Superset to function properly.

Create an Admin User

To access the Superset web interface, you need to create an admin user account. You can do this by running:

superset fab create-admin

This will prompt you to enter your username, email, password, first name, and last name. You can also use any existing user account from your authentication backend by setting the AUTH_TYPE environment variable.

Load Some Sample Data

Apache Superset comes with some sample data sets that you can use to explore its features and functionalities. You can load these data sets by running:

superset load_examples

This will load some data sources and dashboards into your Superset instance. You can also add your own data sources by using the SQL Lab or the Data menu in the web interface.

Finally, to start the Superset web server, run the following command:

superset run -p 8088 --with-threads --reload --debugger

If everything is set up correctly, navigate to hostname:port in your browser (e.g., http://localhost:8080) and log in using the username and password you created.

Congratulations! You have successfully installed and used Apache Superset on Linux. You can now start creating beautiful visualisations and dashboards with your data.

Configuring Apache Superset

Branding Apache Superset

To change the Superset application name, logo, favicon, logo target path, logo tooltip and header title set the value for the below variables in the configuration file superset_config.py

from typing import Callable

# Uncomment to setup Your App name
APP_NAME = "Superset"

# Specify the App icon
APP_ICON = "/static/assets/images/superset-logo-horiz.png"
# replace the image specified in the above path or update your image name
# file path: superset/lib/python3.11/site-packages/superset/static/assets/images/
# replace superset with the virtual environment name
# file path: venv/lib/python3.11/site-packages/superset/static/assets/images/

# Specify where clicking the logo would take the user
# e.g. setting it to '/' would take the user to '/superset/welcome/'

# Specify tooltip that should appear when hovering over the App Icon/Logo

# Specify any text that should appear to the right of the logo
LOGO_RIGHT_TEXT: Callable[[], str] | str = ""

# Multiple favicons can be specified here. The "href" property
# is mandatory, but "sizes," "type," and "rel" are optional.
# For example:
# {
# "href":path/to/image.png",
# "sizes": "16x16",
# "type": "image/png"
# "rel": "icon"
# },
FAVICONS = [{"href": "/static/assets/images/favicon.png"}]
# replace the image specified in the above path or update your image name
# file path: superset/lib/python3.11/site-packages/superset/static/assets/images/
# replace superset with the virtual environment name
# file path: venv/lib/python3.11/site-packages/superset/static/assets/images/

Setting up MySQL or PostgreSQL as a Production Metastore

One of the requirements for installing and running Apache Superset is to have a metadata database that stores information such as user credentials, dashboard configurations, query history, and more. 

By default, Superset uses SQLite as the metadata database, which is a simple and lightweight file-based database. However, SQLite has some limitations, such as a lack of concurrency support, scalability issues, and security risks. Therefore, it is recommended to use a more robust and reliable database system for production environments, such as MySQL or PostgreSQL. 

To set up the production database:

Install the database driver for MySQL or PostgreSQL. You can use pip to install the driver package. 

For MySQL, you need to install mysqlclient:

pip install mysqlclient

For PostgreSQL, you need to install psycopg2:

pip install psycopg2

Create a database and a user for Superset on your database server. You can use any tool or command line interface to do this. 

For example, for MySQL, you can use the following commands:

mysql -u root -p


CREATE USER 'superset'@'localhost' IDENTIFIED BY 'superset';

GRANT ALL PRIVILEGES ON superset.* TO 'superset'@'localhost';

For PostgreSQL, you can use the following commands:

psql -U postgres


CREATE USER superset WITH PASSWORD 'superset';


Edit the Superset configuration file superset_config.py to specify the database URI for the metadata database.

In the configuration file, look for the line that starts with SQLALCHEMY_DATABASE_URI  and change it to point to your MySQL or PostgreSQL database. 

The format of the URI is:


For example, for MySQL, you can use:

SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://superset:superset@localhost:3306/superset'

For PostgreSQL, you can use:

SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://superset:superset@localhost:5432/superset'

Initialize the metadata database by running the following command:

superset db upgrade

This will create the necessary tables and indexes for Superset in your database.

Restart Superset by running the following command

superset run -p 8088 --with-threads --reload --debugger

Email Integration

Superset can be configured to send email alerts when a SQL condition is reached and schedule reports to send screenshots of dashboards and charts.

To enable the alerts & reporting feature, update the superset_config.py file as follows:


Superset uses Celery Beat as a scheduler and Celery worker for sending alerts and reports.


Celery is an open-source distributed task queue framework. It allows you to run tasks asynchronously (for example sending an email) and distribute them across multiple workers. Ideal for background processing and task scheduling, etc.

Celery Beat:

Celery beat is the scheduling component of Celery, responsible for managing periodic or scheduled tasks. The schedule information can be stored in a different backend such as a database or an in-memory store.

Celery Worker:

The celery worker is responsible for executing tasks that are enqueued by the Celery application. When you define and enqueue tasks in the application. It is added to a message queue (such as RabbitMQ, Redis or others) and Celery workers pull tasks from the queue and execute them. Workers are usually distributed across machines or processes, enabling you to parallelise the execution of tasks and achieve better performance and scalability.

Celery Configuration:

from celery.schedules import crontab

# Celery configuration
REDIS_HOST = "localhost"
REDIS_PORT = "6379"

class CeleryConfig:
broker_url = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
result_backend = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 600,
'soft_time_limit': 600,
'ignore_result': True,
beat_schedule = {
'reports.scheduler': {
'task': 'reports.scheduler',
'schedule': crontab(minute='*', hour='*'),
'reports.prune_log': {
'task': 'reports.prune_log',
'schedule': crontab(minute=0, hour=0),
'email_reports.schedule_hourly': {
'task': 'email_reports.schedule_hourly',
'schedule': crontab(minute=1, hour='*'),

CELERY_CONFIG = CeleryConfig

SMTP Configuration:

SMTP_HOST = "smtp.mydomain.com" # change to your host
SMTP_PORT = 25 # your port, e.g. 587
SMTP_SSL_SERVER_AUTH = False # If your using an SMTP server with a valid certificate
SMTP_SSL = False
SMTP_USER = SMTP_USER # use the empty string "" if using an unauthenticated SMTP server
SMTP_PASSWORD = SMTP_PASSWORD # use the empty string "" if using an unauthenticated SMTP server

EMAIL_REPORTS_SUBJECT_PREFIX = "Insights - " # optional - overwrites default value in config.py of "[Report] "
# The text for call-to-action link in Alerts & Reports emails
EMAIL_REPORTS_CTA = "Explore in BI Portal"

Screenshot Configuration:

# WebDriver configuration


# This is for internal use, you can keep http

# This is the link sent to the recipient. Change to your domain, e.g. https://superset.mydomain.com
WEBDRIVER_BASEURL_USER_FRIENDLY = "https://superset.mydomain.com"

Generic configuration:

# Execute Alerts & Reports as admin User

Embedding Apache Superset Dashboards

One of its powerful features is the ability to embed dashboards within external applications, providing seamless integration and enhanced data analysis capabilities.

To begin embedding Superset dashboards, follow these steps:

Enable Embedded Superset Feature:
  • Edit the Feature Flag configuration by adding the Embedded Superset flag:


  • Restart the Superset instance to activate the Embedded Superset feature.
superset run -p 8088 --with-threads --reload --debugger

    Dashboard Configuration
    • Access the dashboard sub-menu and click on "Embed Dashboard."
    • Enable Embedding and make a note of the generated embedding ID.
    Apache Superset Embedded Settings
    Apache Superset Embedded Settings

    Creating a dedicated user for Embedded Dashboard access
    • Go to Settings and select "List User."
    • Add a new user record dedicated to embedded dashboard access.
    • Enable restricted access for the user.
    Apache Superset adding an user
    Apache Superset adding an user

    Generating Guest Token

    Apache Superset uses a access token called guest token to enable access to your user without requiring login authentication from the hosted application. You can generate guest token using Superset API guest token endpoint. 

    First generate login access token using /login endpoint by passing the above created user credentials: 

    Apache Superset API - Generating login token
    Apache Superset API - Generating login token

    Now pass the login access token along with guest token parameters to /guest_token endpoint to generate guest token.

    Apache Superset API - Generating guest token
    Apache Superset API - Generating guest token

    You can test this process by using the swagger interface provided by the apache superset. Swagger can be accessed by visiting the https://superset.mydomain.com/swagger/v1 path.

    Apache Superset API - Swagger

    Python code example:

    Generating login token

    def get_login_token():
    url = 'https://superset.mydomain.com/api/v1/security/login'
    headers = {
            'accept': 'application/json',
            'Content-Type': 'application/json'
    data = {
    "password": "guestpwd",
    "provider": "db",
    "refresh": "true",
    "username": "guest"
    session = requests.Session()
    response = session.post(url, headers=headers, data=json.dumps(data))
    return response.json()

    Generating guest token

    def get_guest_token(access_token):
    url = 'https://superset.mydomain.com/api/v1/security/login'
    headers = {
    'accept': 'application/json',
    'Authorization': f'Bearer {access_token}',
    'Content-Type': 'application/json',
    data = {
    "resources": [{"id": "11", "type": "dashboard"}],
    "rls": [],
    "user": {"first_name": "guest", "last_name": "user", "username": "guest"}
    session = requests.Session()
    response = session.post(url, headers=headers, data=json.dumps(data))
    return response.json()

    Note: Ensure to handle token refresh within your hosting application / application backend, as the guest token is valid for 5 minutes.

    Embedding Dashboard using Superset Embedded SDK

    • Include the Superset Embedded SDK script using a CDN in the <head> section of your HTML page.
    • Use the Superset Embedded SDK to embed the dashboard within an <iframe> element.


    async function fetchGuestTokenFromBackend() {
    // let response = await fetch('https://mybackend.com/fetchGuestToken', { method: 'POST'});
    let data = 'guest_token_value_generated_using_above_python_code'
    return data
    id: '164104a5-69a8-484b-bb51-2a5cf7cb4a29', // given by the Superset embedding UI
    supersetDomain: 'https://superset.mydomain.com',
    mountPoint: document.getElementById('dashboard-container'), // any html element that can contain an iframe
    fetchGuestToken: () => fetchGuestTokenFromBackend(),
    dashboardUiConfig: {
    } // dashboard UI config: hideTitle, hideTab, hideChartControls (optional)


    Refer to the provided HTML and JavaScript code examples for the complete embedding process.

    <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
    <title>Superset Embedded Example</title>
    <script src="https://unpkg.com/@superset-ui/embedded-sdk"></script>
    <link rel="preconnect" href="https://fonts.googleapis.com">
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
    <link href="https://fonts.googleapis.com/css2?family=Noto+Sans:wght@400;700&display=swap" rel="stylesheet">
    iframe {
    width: 100%;
    height: 100%;
    border: none;
    margin-top: 3%;

    pretext {
    margin-right: 10%;
    margin-left: 10%;
    font-family: 'Noto Sans', sans-serif;
    <div class="pretext">
    <div style=" display: flex; justify-content: center;">
    <h2 style="position:absolute; font-family: 'Noto Sans', sans-serif;"> [24]7 Synergen Embedded Dashboard </h2>
    <p id="dashboard-container"></p>

    async function fetchGuestTokenFromBackend() {
    // let response = await fetch('https://mybackend.com/fetchGuestToken', { method: 'POST'});
    let data = 'guest_token_value_generated_using_above_python_code'
    return data
    id: '164104a5-69a8-484b-bb51-2a5cf7cb4a29', // given by the Superset embedding UI
    supersetDomain: 'https://superset.mydomain.com',
    mountPoint: document.getElementById('dashboard-container'), // any html element that can contain an iframe
    fetchGuestToken: () => fetchGuestTokenFromBackend(),
    dashboardUiConfig: {
    } // dashboard UI config: hideTitle, hideTab, hideChartControls (optional)


    You can refer the superset embedded sdk document for other configuration details.


    If you encounter any challenges during the embedding process, consider the following configurations related to guest token and embedding feature:

    "DRILL_TO_DETAIL": True,
    SESSION_COOKIE_HTTPONLY = True # Prevent cookie from being read by frontend JS?
    SESSION_COOKIE_SECURE = True # Prevent cookie from being transmitted over non-tls?
    SESSION_COOKIE_SAMESITE = "None" # One of [None, 'None', 'Lax', 'Strict']

    # Cross-Origin
    ENABLE_CORS = True
    'supports_credentials': True,
    'allow_headers': ['*'],
    'origins': ['*', ''],

    # Dashboard embedding
    GUEST_ROLE_NAME = "Gamma"
    GUEST_TOKEN_JWT_SECRET = "your_secret_key"
    GUEST_TOKEN_JWT_EXP_SECONDS = 3600 # 60 minutes

    By following the outlined steps and utilizing the Superset Embedded SDK, you can seamlessly integrate Superset dashboards into your external applications. This empowers users with enhanced data visualization capabilities and enables efficient analysis within a unified environment. The combination of Superset's embedding feature and its robust data visualization capabilities makes it an invaluable tool for data-driven applications

    Apache Superset Authentication: Implementing OAuth2 with Azure Identity Platform

    Flask App Builder offers a range of authentication methods to bolster security within your applications. These include:

    • Database
    • OpenID
    • LDAP
    • Remote User
    • OAuth
    Depending on your specific use case and requirements, Apache Superset enables you to select an authentication type from the supported options. To gain deeper insights into these authentication types and their implementation details, consult the Flask App Builder Security documentation.

    We will explore the process of implementing OAuth2 authentication using the Azure Identity Platform for Apache Superset.

    Azure AD OAuth2 Authentication Implementation

    Assuming you have registered your application in Azure AD and generated the required client secret for configuration purposes, follow these steps:

    Register your application in Azure AD and make note of the Tenant ID, Client ID, and Client Secret. If not done already, refer to the documentation on registering a client application with the Microsoft Identity Platform.

    Implement OAuth2 authentication for your Apache Superset instance:

    Edit your Apache Superset configuration file (superset_config.py) and add the following configurations:

    from flask_appbuilder.security.manager import AUTH_OAUTH

    # Set the authentication type to OAuth

    # Self registration & default role
    "name": "azure",
    "icon": "fa-windows",
    "token_key": "access_token",
    "remote_app": {
    "client_id": "your client id",
    "client_secret": "your client secret",
    "api_base_url": "https://login.microsoftonline.com/tenant_id/oauth2",
    "client_kwargs": {
    "scope": "User.read name preferred_username email profile upn groups",
    "resource": "your client id",
    "request_token_url": None,
    "access_token_url": "https://login.microsoftonline.com/tenant_id/oauth2/token",
    "authorize_url": "https://login.microsoftonline.com/tenant_id/oauth2/authorize",

    Create a custom Security Manager class that extends the Superset Security Manager class with the following settings:

    import logging
    from superset.security import SupersetSecurityManager

    class CustomSsoSecurityManager(SupersetSecurityManager):
    def _get_oauth_user_info(self, provider, resp=None):
    #logging.debug("Oauth2 provider: {0}.".format(provider))

    if provider == "azure":
    #logging.debug("Azure response received : {0}".format(resp))
    id_token = resp["id_token"]
    me = self._azure_jwt_token_parse(id_token)
    #logging.debug("Parse JWT token : {0}".format(me))
    return {
    "name": me.get("name", ""),
    "email": me["upn"],
    "first_name": me.get("given_name", ""),
    "last_name": me.get("family_name", ""),
    "id": me["oid"],
    "username": me["oid"],
    "role_keys": me.get("roles", []),

    oauth_user_info = _get_oauth_user_info

    Save this file as custom_sso_security_manager.py and import the Custom Security Manager in your configuration file:

    from custom_sso_security_manager import CustomSsoSecurityManager
    CUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager

    Update your client application's redirect URL:


    After restarting, you will notice the updated login page reflecting your changes.

    Azure AD:

    Apache Superset OAuth2- Azure AD Authentication
    Apache Superset OAuth2- Azure AD Authentication

    If you have configured multiple Identity Providers, you can see multiple providers option in sign-in page:

    Apache Superset OAuth2 Authentication
    Apache Superset OAuth2 Authentication

    By following these steps, you can successfully implement OAuth2 authentication with the Azure Identity Platform for your Apache Superset instance. This integration ensures a secure authentication process while enabling seamless user access to the platform.

    Benefits and Features of Apache Superset

    Apache Superset offers a wide range of benefits and features that make it a powerful tool for business intelligence and data visualisation:

    User-Friendly Interface: Superset provides an intuitive and user-friendly interface that makes it easy for users to explore and analyse data. With drag-and-drop functionality and interactive visualisations, users can create insightful dashboards without writing complex queries or code.

    Wide Range of Data Sources: Superset supports a variety of data sources, including popular databases like MySQL, PostgreSQL, and SQLite, as well as big data platforms like Apache Hive, Apache Spark, and Presto. It also integrates with cloud-based storage solutions like Amazon S3 and Google Cloud Storage.

    Interactive Dashboards: Superset allows users to create interactive dashboards with a wide range of visualization options, including charts, graphs, maps, and tables. Users can customize the appearance and layout of dashboards and easily share them with others.

    Ad-Hoc Analysis: With Superset, users can perform ad-hoc analysis by exploring and filtering data in real-time. The SQL Lab feature allows users to write and execute SQL queries directly in the browser, providing instant results and insights.

    Collaboration and Sharing: Superset enables collaboration by allowing users to share dashboards, charts, and SQL queries with others. Users can set permissions and access controls to ensure data security and privacy.

    Extensibility and Customization: Superset is highly extensible and customizable, allowing users to add custom visualizations, plugins, and integrations. The Superset community actively contributes to the development of new features and enhancements.

    Scalability and Performance: Superset is designed to scale out in distributed environments and can handle large datasets and high user concurrency. It leverages technologies like Apache Druid and Apache Arrow to provide fast and efficient data processing.

    Active Community Support: Apache Superset has a vibrant and active community of users and contributors. The community provides support, documentation, and regular updates, ensuring that Superset remains a robust and reliable tool for business intelligence.

    In this article, we learned how to install Apache Superset on a Linux machine using pip and how to initialize the Superset database, create an admin user, load some example data sets, start the Superset server, and access the web interface. We also saw some screenshots of what Superset looks like and what it can do.

    Apache Superset is a powerful and easy-to-use data visualization and exploration platform that can help you gain insights from your data and communicate them effectively. You can learn more about Superset by visiting its official website, documentation, and GitHub repository.

    I hope you enjoyed this article and found it useful. Happy data visualization!

    0 thoughts:

    Post a Comment