Semantic modeling from SAP BDC vs building custom ML feature stores in Snowpark

michaelanalyst · November 20, 2025, 7:15pm

Our team is at a crossroads deciding between two approaches for our ML feature engineering pipeline on Snowflake 7.5. We’re ingesting SAP master and transactional data through BDC, and need to build features for multiple predictive models (demand forecasting, customer churn, pricing optimization).

Option A: Leverage SAP’s semantic data products and governance layer directly. SAP BDC provides pre-built semantic models with business logic already encoded. We’d consume these through Snowflake’s data sharing and build features on top.

Option B: Build custom feature engineering pipelines using Snowpark Python UDFs, giving us full control over transformations but requiring us to re-implement business logic and maintain governance ourselves.

I’m leaning toward a hybrid approach but curious what others have experienced. The semantic models are appealing for governance and business alignment, but I’m concerned about flexibility for complex feature engineering. Has anyone successfully combined SAP semantic products with custom Snowpark transformations? What’s worked well in terms of model governance and compliance when you need both pre-built semantics and custom features?

carl1308 · January 22, 2026, 6:46am

Option B gives you way more flexibility. Snowpark Python UDFs let you implement any feature engineering pattern - complex window functions, statistical transformations, even call external ML libraries. We built our entire feature store this way and haven’t looked back.

That said, don’t underestimate the governance overhead. You’ll need to build your own metadata management, lineage tracking, and access controls. We use a combination of Snowflake tags, custom Python decorators for UDF documentation, and a feature registry in a separate schema to track definitions and ownership.

func_neumannex · November 20, 2025, 11:17pm

Version pinning is critical. We create versioned views of SAP semantic products in our Snowflake environment:

CREATE VIEW feature_store.customer_semantic_v1 AS
SELECT * FROM sap_bdc_share.customer_semantic;

This isolates our feature pipelines from upstream changes. When SAP updates their semantics, we create a new versioned view (v2), test it thoroughly with our Snowpark transformations, and then migrate features incrementally. We also maintain a feature compatibility matrix tracking which ML models use which semantic versions.

For model governance, we tag all features with metadata including source semantic version, transformation logic hash, and approval status. This gives us full lineage from SAP source through transformations to deployed models.

andrea_king · November 20, 2025, 11:25pm

From a compliance perspective, the hybrid approach gives you the best audit story. SAP semantic products come with built-in data quality rules and business validation. When regulators ask how you calculated a feature, you can point to SAP’s certified business logic for base attributes, then document your ML transformations separately.

Just make sure you implement proper access controls at both layers. SAP semantics inherit their own RBAC, but your Snowpark feature transformations need separate governance. We use Snowflake’s column-level security and dynamic data masking to ensure sensitive features (PII, financial data) are properly protected even after transformation.

abhishekverma · November 19, 2025, 3:40pm

The hybrid approach is definitely the way to go. SAP semantic data products give you pre-validated business logic - things like customer hierarchies, product classifications, and financial calculations that have been audited and approved. Don’t reinvent that wheel.

Use SAP semantics for your base features (customer attributes, product categories, transaction amounts), then layer Snowpark transformations for derived features (RFM scores, propensity calculations, embeddings). This way your models inherit SAP’s governance for core business concepts while you maintain flexibility for ML-specific engineering.

# Example pattern we use
@sproc(name="enrich_customer_features")
def transform(session, semantic_view):
    base = session.table(semantic_view)  # SAP semantic
    # Custom ML features using Snowpark
    return base.with_column("rfm_score",
        calculate_rfm(col("recency"), col("frequency")))

soph2650 · November 19, 2025, 6:19am

We went through this exact decision last year. Started with Option A (pure semantic models) because governance was our top priority. SAP’s semantic layer handles data lineage, business definitions, and access controls beautifully - critical for regulatory compliance in our industry.

The limitation hit us when we needed time-series features with custom lag calculations and rolling aggregations. SAP semantics are great for standard business metrics but don’t cover advanced ML feature patterns. We ended up with hybrid: consume SAP semantics as base tables, then apply Snowpark transformations for ML-specific features. Best of both worlds.

snow_william · November 19, 2025, 9:53am

After synthesizing all this feedback and running some POCs, here’s my consolidated perspective on the semantic vs custom feature engineering decision:

SAP Semantic Data Products and Governance: The semantic layer provides enormous value for foundational business concepts. SAP BDC semantic models come with pre-validated business rules, data quality checks, and audit trails that are critical for regulated industries. For customer hierarchies, product classifications, financial metrics, and supply chain KPIs, leveraging these semantics saves months of validation work and ensures consistency across analytics and ML use cases. The built-in governance (lineage, access controls, change management) is production-grade and compliance-ready.

Snowpark Python UDF Development: Snowpark excels at ML-specific transformations that semantic models don’t cover. Complex window functions, statistical aggregations, time-series feature engineering, and custom business logic all benefit from Snowpark’s flexibility. The Python UDF framework lets you implement any transformation pattern, call external libraries, and even integrate pre-trained models for feature generation. The key is treating Snowpark as the “enrichment layer” rather than replacing semantic foundations entirely.

Feature Engineering Best Practices: Implement a layered architecture:

L1 (Raw): SAP BDC shared data, unchanged
L2 (Semantic): SAP semantic products with business logic
L3 (Base Features): Direct mappings from semantics to ML features
L4 (Derived Features): Snowpark transformations creating ML-specific features

This separation makes testing, debugging, and governance much more manageable. Each layer has clear ownership and validation criteria.

Hybrid Semantic + Custom Approach: The hybrid pattern is the practical solution. Use SAP semantics for all business-validated attributes (customer demographics, product attributes, transaction amounts, organizational hierarchies). Build Snowpark transformations for ML-specific features (RFM scores, propensity calculations, embeddings, statistical aggregations, time-series lags).

Implement version pinning for semantic dependencies:

# Snowpark feature pipeline with semantic versioning
from snowflake.snowpark.functions import col, lag, avg

@sproc(name="generate_customer_features_v3")
def build_features(session, semantic_version="v2"):
    # Pin to specific semantic version
    semantic_view = f"sap_share.customer_semantic_{semantic_version}"
    base = session.table(semantic_view)

    # Layer custom ML features on semantic foundation
    features = base.with_columns([
        avg(col("purchase_amount")).over(
            Window.partition_by("customer_id")
            .order_by("transaction_date")
            .rows_between(-6, -1)
        ).alias("avg_purchase_6m"),

        lag(col("last_purchase_date"), 1).over(
            Window.partition_by("customer_id")
            .order_by("transaction_date")
        ).alias("days_since_previous")
    ])

    return features

Model Governance and Compliance: Implement comprehensive metadata management:

Feature Registry: Track every feature with source semantic version, transformation logic, business owner, and approval status
Lineage Tracking: Use Snowflake tags to link features back to SAP semantic sources and Snowpark transformation code
Access Controls: Inherit RBAC from SAP semantics for base attributes, apply additional column-level security for derived features
Validation Framework: Automated tests comparing semantic outputs to expected values, plus data quality checks on Snowpark transformations
Change Management: When SAP updates semantics, create new versioned views, run regression tests on all dependent features, migrate models incrementally

The hybrid approach gives you SAP’s governance benefits (audit trails, validated business logic, regulatory compliance) while maintaining ML flexibility through Snowpark. Document everything, version aggressively, and treat semantic products as immutable inputs to your feature engineering pipeline.

Recommendation: Start with SAP semantics for all available business concepts. Only build custom Snowpark features when semantic products don’t cover your needs. As your ML platform matures, you’ll develop patterns for common transformations that can be templatized and governed as rigorously as the semantic layer itself. The goal is controlled flexibility - innovation where needed, standardization where possible.

anthony_admin · November 20, 2025, 8:24pm

Really appreciate these perspectives. The governance argument for SAP semantics is compelling, especially for our finance and supply chain models where audit trails matter. The hybrid pattern makes sense - use semantic products as the “source of truth” layer, then build ML feature transformations on top.

How do you handle versioning in a hybrid setup? When SAP updates their semantic models (new business rules, schema changes), how do you ensure your downstream Snowpark features don’t break? Do you pin to specific versions of the semantic views or build defensive transformation logic?

Topic		Views
Semantic modeling from SAP BDC vs building custom ML feature engineering in Snowpark Snowflake discussion , python , model-governance , ad-hoc-reporting , ai-ml-integrati , snow-7-5 , sap-bdc , snowpark , architectural-decision	6	March 25, 2025
Comparing semantic models vs relational data models for managing product specifications SAP PLM discussion , spec-mgmt , data-modeling , tools-utilities , analytics-insights , scalability , sap-2021 , semantic-web , ai-integration	6	April 24, 2025
Balancing semantic model governance with self-service BI: centralized vs federated approaches Power BI discussion , semantic-layer , configuration , self-service-bi , pbi-2020 , tmdl , power-bi-service , governance-automation , git-integration	7	August 14, 2025
Balancing semantic model governance with self-service BI: centralized vs distributed approaches Power BI discussion , governance , semantic-layer , configuration , self-service-bi , pbi-2020 , power-bi-fabric , model-certification , tmdl	6	February 28, 2025
Metric governance vs. business unit autonomy – where do you draw the line? AI Adoption in BA-BI discussion , data-quality , semantic-layer , scaling , dbt , power-bi , ai-adoption , bi-ai , metric-governance	5	January 1, 2026
Data Virtualization and Semantic Layers for Flexible and Unified Analytics Generic BA-BI Topics discussion , governance , query-performance , semantic-layers , collaborative-analytics , data-virtualization , unified-data-model , data-virtualization-s	4	May 20, 2025
Ontology-based semantic models vs traditional PLM data threads: real-world trade-offs SAP PLM discussion , client-side , scalability , digital-thread , data-integration , mco-mgmt , sap-2021 , semantic-web , owl-rdf	5	May 23, 2025
Semantic vs relational data models for digital thread management at enterprise scale SAP PLM discussion , integration , traceability , data-modeling , server-side , digital-thread , sap-2021 , test-data-mg , ontology-design	4	April 27, 2025
SAP Datasphere vs. CDS-based modeling for budgeting scenarios: strengths, limitations, and integration SAP S/4HANA discussion , data-modeling , cds-views , budgeting , sap-1909 , sap-datasphere , sac-integration , semantic-modeling , planning-analytics	5	May 15, 2025

Semantic modeling from SAP BDC vs building custom ML feature stores in Snowpark

Related topics