|  | 
| Data Visualization - Apache Superset Guide. Image Source: Unsplash | 
Introduction to Business Intelligence
The Role of Data Analytics in Business Intelligence
Data analytics can be classified into four main types:
Descriptive Analytics: Describes what has happened in the past by summarizing historical data. It provides a snapshot of the current state and helps businesses understand trends and patterns.
Diagnostic Analytics: Focuses on identifying the causes of past events and understanding why they occurred. It helps businesses uncover the root causes behind certain outcomes or trends.
Predictive Analytics: Uses historical data and statistical models to predict future outcomes and trends. It enables businesses to make proactive decisions and anticipate potential challenges or opportunities.
Prescriptive Analytics: Goes beyond predictions and provides recommendations on the best course of action. It uses optimization techniques and simulation models to guide decision-making.
The Power of Data Visualization in Business Intelligence
Data visualization offers several benefits:
- Improved Data Understanding: Visualizing data makes it easier for users to grasp complex information quickly. By presenting data in a visual format, businesses can enhance data comprehension and promote better decision-making.
- Enhanced Insights: Visualization helps users identify trends, patterns, and outliers in data. By visually exploring data, businesses can uncover hidden insights and make data-driven decisions.
- Increased Engagement: Visual representations of data are more engaging and memorable compared to raw numbers or text. By using interactive charts, graphs, and dashboards, businesses can captivate their audience and ensure the message is conveyed effectively.
- Efficient Communication: Data visualization simplifies the communication of complex ideas and findings. By presenting data visually, businesses can convey information to stakeholders in a clear and concise manner, fostering collaboration and alignment.
Apache Superset: An Introduction
Installing Apache Superset on Linux:
Before diving into the installation process, it's essential to ensure that the required dependencies are in place. The installation process can vary depending on the operating system. Let's explore the installation steps for different environments.Debian and Ubuntu
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev
Fedora and RHEL-derivative Linux distributions
sudo yum install gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
sudo dnf install gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel
pip3 install --upgrade pip
Creating a Virtual Environment
pip install virtualenvvirtualenv supersetsource superset/bin/activate
Installing Apache Superset
pip install apache-superset
Setting up Apache Superset Configuration file
Create a directory called superset or any name you prefer:
mkdir superset
export PYTHONPATH="${PYTHONPATH}:/visualization/superset"
After adding this line, you can verify that the PYTHONPATH environment variable has been set correctly by running the command
echo $PYTHONPATH
SECRET_KEY = 'MY_SECRET_KEY'
superset db upgrade
Create an Admin User
superset fab create-admin
Load Some Sample Data
superset load_examples
Finally, to start the Superset web server, run the following command:
superset run -p 8088 --with-threads --reload --debugger
Configuring Apache Superset
Branding Apache Superset
from typing import Callable# Uncomment to setup Your App nameAPP_NAME = "Superset"# Specify the App iconAPP_ICON = "/static/assets/images/superset-logo-horiz.png"# replace the image specified in the above path or update your image name# file path: superset/lib/python3.11/site-packages/superset/static/assets/images/# replace superset with the virtual environment name# file path: venv/lib/python3.11/site-packages/superset/static/assets/images/# Specify where clicking the logo would take the user# e.g. setting it to '/' would take the user to '/superset/welcome/'LOGO_TARGET_PATH = None# Specify tooltip that should appear when hovering over the App Icon/LogoLOGO_TOOLTIP = ""# Specify any text that should appear to the right of the logoLOGO_RIGHT_TEXT: Callable[[], str] | str = ""# Multiple favicons can be specified here. The "href" property# is mandatory, but "sizes," "type," and "rel" are optional.# For example:# {# "href":path/to/image.png",# "sizes": "16x16",# "type": "image/png"# "rel": "icon"# },FAVICONS = [{"href": "/static/assets/images/favicon.png"}]# replace the image specified in the above path or update your image name# file path: superset/lib/python3.11/site-packages/superset/static/assets/images/# replace superset with the virtual environment name# file path: venv/lib/python3.11/site-packages/superset/static/assets/images/
Setting up MySQL or PostgreSQL as a Production Metastore
One of the requirements for installing and running Apache Superset is to have a metadata database that stores information such as user credentials, dashboard configurations, query history, and more.
By default, Superset uses SQLite as the metadata database, which is a simple and lightweight file-based database. However, SQLite has some limitations, such as a lack of concurrency support, scalability issues, and security risks. Therefore, it is recommended to use a more robust and reliable database system for production environments, such as MySQL or PostgreSQL.
To set up the production database:
pip install mysqlclient
For PostgreSQL, you need to install psycopg2:
pip install psycopg2
For example, for MySQL, you can use the following commands:
mysql -u root -pCREATE DATABASE superset;CREATE USER 'superset'@'localhost' IDENTIFIED BY 'superset';GRANT ALL PRIVILEGES ON superset.* TO 'superset'@'localhost';
psql -U postgresCREATE DATABASE superset;CREATE USER superset WITH PASSWORD 'superset';GRANT ALL PRIVILEGES ON DATABASE superset TO superset;
dialect+driver://username:password@host:port/database
SQLALCHEMY_DATABASE_URI = 'mysql+mysqldb://superset:superset@localhost:3306/superset'
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://superset:superset@localhost:5432/superset'
superset db upgrade
Restart Superset by running the following command
superset run -p 8088 --with-threads --reload --debugger
Email Integration
To enable the alerts & reporting feature, update the superset_config.py file as follows:
FEATURE_FLAGS = {"ALERT_REPORTS": True,"ALERTS_ATTACH_REPORTS": True,}
Celery
Celery is an open-source distributed task queue framework. It allows you to run tasks asynchronously (for example sending an email) and distribute them across multiple workers. Ideal for background processing and task scheduling, etc.
Celery Beat
Celery beat is the scheduling component of Celery, responsible for managing periodic or scheduled tasks. The schedule information can be stored in a different backend such as a database or an in-memory store.
Celery Worker
The celery worker is responsible for executing tasks that are enqueued by the Celery application. When you define and enqueue tasks in the application. It is added to a message queue (such as RabbitMQ, Redis or others) and Celery workers pull tasks from the queue and execute them. Workers are usually distributed across machines or processes, enabling you to parallelise the execution of tasks and achieve better performance and scalability.
Celery Configuration:
from celery.schedules import crontab# Celery configurationREDIS_HOST = "localhost"REDIS_PORT = "6379"class CeleryConfig:broker_url = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)imports = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )result_backend = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)worker_prefetch_multiplier = 10task_acks_late = Truetask_annotations = {'sql_lab.get_sql_results': {'rate_limit': '100/s',},'email_reports.send': {'rate_limit': '1/s','time_limit': 600,'soft_time_limit': 600,'ignore_result': True,},}beat_schedule = {'reports.scheduler': {'task': 'reports.scheduler','schedule': crontab(minute='*', hour='*'),},'reports.prune_log': {'task': 'reports.prune_log','schedule': crontab(minute=0, hour=0),},'email_reports.schedule_hourly': {'task': 'email_reports.schedule_hourly','schedule': crontab(minute=1, hour='*'),},}CELERY_CONFIG = CeleryConfig
SMTP Configuration:
SMTP_HOST = "smtp.mydomain.com" # change to your hostSMTP_PORT = 25 # your port, e.g. 587SMTP_STARTTLS = TrueSMTP_SSL_SERVER_AUTH = False # If your using an SMTP server with a valid certificateSMTP_SSL = FalseSMTP_USER = SMTP_USER # use the empty string "" if using an unauthenticated SMTP serverSMTP_PASSWORD = SMTP_PASSWORD # use the empty string "" if using an unauthenticated SMTP serverSMTP_MAIL_FROM = SMTP_USEREMAIL_REPORTS_SUBJECT_PREFIX = "Insights - " # optional - overwrites default value in config.py of "[Report] "# The text for call-to-action link in Alerts & Reports emailsEMAIL_REPORTS_CTA = "Explore in BI Portal"ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
Screenshot Configuration:
# WebDriver configurationWEBDRIVER_TYPE = "chrome"WEBDRIVER_OPTION_ARGS = ["--force-device-scale-factor=2.0","--high-dpi-support=2.0","--headless","--disable-gpu","--disable-dev-shm-usage","--no-sandbox","--disable-setuid-sandbox","--disable-extensions",]# This is for internal use, you can keep httpWEBDRIVER_BASEURL = "http://127.0.0.1:8000"# This is the link sent to the recipient. Change to your domain, e.g. https://superset.mydomain.comWEBDRIVER_BASEURL_USER_FRIENDLY = "https://superset.mydomain.com"SCREENSHOT_LOCATE_WAIT = 100SCREENSHOT_LOAD_WAIT = 600
Generic configuration:
# Execute Alerts & Reports as admin UserTHUMBNAIL_SELENIUM_USER = 'admin'ALERT_REPORTS_EXECUTE_AS = [ExecutorType.SELENIUM]
Embedding Apache Superset Dashboards
To begin embedding Superset dashboards, follow these steps.
Enable Embedded Superset Feature:
Edit the Feature Flag configuration by adding the Embedded Superset flag:
FEATURE_FLAGS = {"EMBEDDED_SUPERSET": True,"DASHBOARD_RBAC": True,}
superset run -p 8088 --with-threads --reload --debugger
Dashboard Configuration
- Access the dashboard sub-menu and click on "Embed Dashboard."
- Enable Embedding and make a note of the generated embedding ID.
|  | 
| Apache Superset Embedded Settings | 
Generating Guest Token
|  | 
| Apache Superset API - Generating login token | 
|  | 
| Apache Superset API - Generating guest token | 
|  | 
| Apache Superset API - Swagger | 
Python code example
Generating login token:
def get_login_token():url = 'https://superset.mydomain.com/api/v1/security/login'headers = {'accept': 'application/json','Content-Type': 'application/json'}data = {"password": "guestpwd","provider": "db","refresh": "true","username": "guest"}session = requests.Session()response = session.post(url, headers=headers, data=json.dumps(data))return response.json()
Generating guest token:
def get_guest_token(access_token):url = 'https://superset.mydomain.com/api/v1/security/login'headers = {'accept': 'application/json','Authorization': f'Bearer {access_token}','Content-Type': 'application/json',}data = {"resources": [{"id": "11", "type": "dashboard"}],"rls": [],"user": {"first_name": "guest", "last_name": "user", "username": "guest"}}session = requests.Session()response = session.post(url, headers=headers, data=json.dumps(data))return response.json()
Embedding Dashboard using Superset Embedded SDK
- Include the Superset Embedded SDK script using a CDN in the <head> section of your HTML page.
- Use the Superset Embedded SDK to embed the dashboard within an <iframe> element.
<script>async function fetchGuestTokenFromBackend() {// let response = await fetch('https://mybackend.com/fetchGuestToken', { method: 'POST'});let data = 'guest_token_value_generated_using_above_python_code'return data}supersetEmbeddedSdk.embedDashboard({id: '164104a5-69a8-484b-bb51-2a5cf7cb4a29', // given by the Superset embedding UIsupersetDomain: 'https://superset.mydomain.com',mountPoint: document.getElementById('dashboard-container'), // any html element that can contain an iframefetchGuestToken: () => fetchGuestTokenFromBackend(),dashboardUiConfig: {hideTitle:true,hideTab:true,hideChartControl:true} // dashboard UI config: hideTitle, hideTab, hideChartControls (optional)})</script>
<html><head><meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests"><title>Superset Embedded Example</title><script src="https://unpkg.com/@superset-ui/embedded-sdk"></script><link rel="preconnect" href="https://fonts.googleapis.com"><link rel="preconnect" href="https://fonts.gstatic.com" crossorigin><link href="https://fonts.googleapis.com/css2?family=Noto+Sans:wght@400;700&display=swap" rel="stylesheet"><style>iframe {width: 100%;height: 100%;border: none;margin-top: 3%;}pretext {margin-right: 10%;margin-left: 10%;font-family: 'Noto Sans', sans-serif;}</style></head><body><div class="pretext"><div style=" display: flex; justify-content: center;"><h2 style="position:absolute; font-family: 'Noto Sans', sans-serif;"> [24]7 Synergen Embedded Dashboard </h2></div><div><p id="dashboard-container"></p></div><script>async function fetchGuestTokenFromBackend() {// let response = await fetch('https://mybackend.com/fetchGuestToken', { method: 'POST'});let data = 'guest_token_value_generated_using_above_python_code'return data}supersetEmbeddedSdk.embedDashboard({id: '164104a5-69a8-484b-bb51-2a5cf7cb4a29', // given by the Superset embedding UIsupersetDomain: 'https://superset.mydomain.com',mountPoint: document.getElementById('dashboard-container'), // any html element that can contain an iframefetchGuestToken: () => fetchGuestTokenFromBackend(),dashboardUiConfig: {hideTitle:true,hideTab:true,hideChartControl:true} // dashboard UI config: hideTitle, hideTab, hideChartControls (optional)})</script></div></body></html>
Troubleshooting
SESSION_COOKIE_HTTPONLY = True # Prevent cookie from being read by frontend JS?FEATURE_FLAGS = {"EMBEDDED_SUPERSET": True,"DASHBOARD_RBAC": True,"ENABLE_TEMPLATE_PROCESSING": True,"MENU_HIDE_USER_INFO": False,"DRILL_TO_DETAIL": True,"DASHBOARD_CROSS_FILTERS": True}SESSION_COOKIE_SECURE = True # Prevent cookie from being transmitted over non-tls?SESSION_COOKIE_SAMESITE = "None" # One of [None, 'None', 'Lax', 'Strict']SESSION_COOKIE_DOMAIN = False# Cross-OriginENABLE_CORS = TrueCORS_OPTIONS = {'supports_credentials': True,'allow_headers': ['*'],'resources':['*'],'origins': ['*', 'http://127.0.0.1:5500'],}# Dashboard embeddingGUEST_ROLE_NAME = "Gamma"GUEST_TOKEN_JWT_SECRET = "your_secret_key"GUEST_TOKEN_JWT_ALGO = "HS256"GUEST_TOKEN_HEADER_NAME = "X-GuestToken"GUEST_TOKEN_JWT_EXP_SECONDS = 3600 # 60 minutes
Apache Superset Authentication
Implementing OAuth2 with Azure Identity Platform
- Database
- OpenID
- LDAP
- Remote User
- OAuth
We will explore the process of implementing OAuth2 authentication using the Azure Identity Platform for Apache Superset.
Azure AD OAuth2 Authentication Implementation
Implement OAuth2 authentication for your Apache Superset instance:
from flask_appbuilder.security.manager import AUTH_OAUTH# Set the authentication type to OAuthAUTH_TYPE = AUTH_OAUTH# Self registration & default roleAUTH_USER_REGISTRATION = TrueAUTH_USER_REGISTRATION_ROLE = "Admin"OAUTH_PROVIDERS = [{"name": "azure","icon": "fa-windows","token_key": "access_token","remote_app": {"client_id": "your client id","client_secret": "your client secret","api_base_url": "https://login.microsoftonline.com/tenant_id/oauth2","client_kwargs": {"scope": "User.read name preferred_username email profile upn groups","resource": "your client id",},"request_token_url": None,"access_token_url": "https://login.microsoftonline.com/tenant_id/oauth2/token","authorize_url": "https://login.microsoftonline.com/tenant_id/oauth2/authorize",},},]
import loggingfrom superset.security import SupersetSecurityManagerclass CustomSsoSecurityManager(SupersetSecurityManager):def _get_oauth_user_info(self, provider, resp=None):#logging.debug("Oauth2 provider: {0}.".format(provider))if provider == "azure":#logging.debug("Azure response received : {0}".format(resp))id_token = resp["id_token"]#logging.debug(str(id_token))me = self._azure_jwt_token_parse(id_token)#logging.debug("Parse JWT token : {0}".format(me))return {"name": me.get("name", ""),"email": me["upn"],"first_name": me.get("given_name", ""),"last_name": me.get("family_name", ""),"id": me["oid"],"username": me["oid"],"role_keys": me.get("roles", []),}oauth_user_info = _get_oauth_user_info
from custom_sso_security_manager import CustomSsoSecurityManagerCUSTOM_SECURITY_MANAGER = CustomSsoSecurityManager
https://superset.mydomain.com/oauth-authorized/azure
After restarting, you will notice the updated login page reflecting your changes.
Azure AD:
|  | 
| Apache Superset OAuth2- Azure AD Authentication | 
| Apache Superset OAuth2 Authentication | 
Benefits and Features of Apache Superset
- User-Friendly Interface: Superset provides an intuitive and user-friendly interface that makes it easy for users to explore and analyse data. With drag-and-drop functionality and interactive visualisations, users can create insightful dashboards without writing complex queries or code.
- Wide Range of Data Sources: Superset supports a variety of data sources, including popular databases like MySQL, PostgreSQL, and SQLite, as well as big data platforms like Apache Hive, Apache Spark, and Presto. It also integrates with cloud-based storage solutions like Amazon S3 and Google Cloud Storage.
- Interactive Dashboards: Superset allows users to create interactive dashboards with a wide range of visualization options, including charts, graphs, maps, and tables. Users can customize the appearance and layout of dashboards and easily share them with others.
- Ad-Hoc Analysis: With Superset, users can perform ad-hoc analysis by exploring and filtering data in real-time. The SQL Lab feature allows users to write and execute SQL queries directly in the browser, providing instant results and insights.
- Collaboration and Sharing: Superset enables collaboration by allowing users to share dashboards, charts, and SQL queries with others. Users can set permissions and access controls to ensure data security and privacy.
- Extensibility and Customization: Superset is highly extensible and customizable, allowing users to add custom visualizations, plugins, and integrations. The Superset community actively contributes to the development of new features and enhancements.
- Scalability and Performance: Superset is designed to scale out in distributed environments and can handle large datasets and high user concurrency. It leverages technologies like Apache Druid and Apache Arrow to provide fast and efficient data processing.
- Active Community Support: Apache Superset has a vibrant and active community of users and contributors. The community provides support, documentation, and regular updates, ensuring that Superset remains a robust and reliable tool for business intelligence.
(Disclaimer: The views and opinions expressed in this article are my own and do not necessarily reflect the official policy or position of any
-------------------------------
If you enjoyed this, let’s connect!
🔗Connect with me on LinkedIn and share ideas!
