In today’s competitive retail and e-commerce world, the ability to turn raw data into insights can make or break your business. If you’re new to business intelligence (BI) and thinking of building your own on-premise data platform, this guide is for you.
Whether you’re moving beyond spreadsheets or avoiding high recurring cloud costs, on-prem setups can offer control, flexibility, and long-term cost-efficiency — if done right.
Let’s walk through the key components and considerations.
🔌 Section 1: Connecting to Your Data Sources (From OLTP Systems & APIs)
Your first step is to bring data into the system from your current operational systems (also known as OLTP — Online Transaction Processing systems). These can include relational databases and external APIs like marketing platforms.
🗃️ From RDBMS (e.g., MySQL, PostgreSQL, SQL Server)
You’ll need:
Host: IP address or server name.
Port: Common ports — 3306 (MySQL), 5432 (PostgreSQL), 1433 (SQL Server).
Username & Password: Ensure the user has read access.
Database/Schema: Target schema for extraction.
Driver: Use JDBC/ODBC or native database drivers.
📣 From Marketing APIs (e.g., Facebook Ads)
You’ll need:
Access Token: For API authentication.
App ID / Secret: To identify your application.
Business Manager ID: Tied to your Facebook business account.
Ad Account ID: From which to pull data.
🏢 Section 2: Choosing Your Data Warehouse & Cost Optimization
Recommendation: ClickHouse
Fast columnar storage, excellent for aggregation-heavy queries, and open-source.
ClickHouse has very detail of recommendations about sizing and hardware system: https://clickhouse.com/docs/guides/sizing-and-hardware-recommendations
🛠️ Environments Setup
Split your environment:
Development: For data modeling, testing new pipelines.
Production: Clean, validated data for dashboards.
💡 Cost Tip: Optimize by running dev environments only during work hours or using a small instance.
🔁 Replica Setup
Why: Backups, disaster recovery, scaling read loads.
Types: You can use asynchronous replicas or real-time syncs.
How many: Depends on your SLA and load
If your dashboards are queried by 10+ users every hour with filters by product, region, and channel — queries can get heavy. In that case, consider:
1 replica for BI/reporting only
1 replica for backup or failover
📦 Handling Semi-Structured or NoSQL Data
Recommendations:
Compress data before storing
Use MongoDB for JSON-style data.
Use HDFS for larger unstructured datasets.
If volume is low, consider cloud storage for easier management.
🧰 Section 3: Data Modeling with dbt
dbt (Data Build Tool) is a powerful open-source tool for transforming raw warehouse data into clean models.
SQL-based: Easy for analysts.
Version control: Git-native workflows.
CI/CD-friendly: Automate your pipeline with tests and staging.
Documentation: Auto-generates lineage and model descriptions.
🔄 Section 4: Orchestration with Apache Airflow
Once you have ingestion and modeling in place, you need to automate it.
⚙️ Why Airflow?
Widely adopted: Huge community and plugin support.
Flexible: Supports Python, SQL, Bash, and more.
On-prem and cloud compatible
Custom operators: Build anything you need.
Example: You can schedule ETL jobs to pull sales data every night at 2AM, transform it with dbt at 3AM, and have dashboards refreshed by 6AM.
📊 Section 5: Data Visualization Tools
🆓 Open-Source Options: Metabase, Apache Superset
Pros: Free to use, lightweight
Cons: Requires technical setup and some maintenance
💼 Enterprise / Subscription Tools: Power BI, Tableau, Looker
Pros: Rich UI, better UX, advanced analytics, support
Cons: Licensing cost (starts around $10/user/month for Power BI)
💬 Final Thoughts
Setting up an on-premise data platform isn’t just about tools — it’s about choosing the right foundation for your business size, data volume, and team skills.
For retail and e-commerce businesses, this means balancing performance, security, and cost — while ensuring your team can actually use the data to make decisions.
If you’re just getting started, don’t overbuild. Start small:
One reliable data source
One clean warehouse
One useful dashboard
If you’re building your first data platform and want some help evaluating tools, cost, or architecture — feel free to reach out. We’re happy to share what we’ve learned from real-world experience.
Website: www.diligence-intelligence.com
Email: info@diligence-intelligence.com