Semantic modeling from SAP BDC vs building custom ML feature stores in Snowpark

Our team is at a crossroads deciding between two approaches for our ML feature engineering pipeline on Snowflake 7.5. We’re ingesting SAP master and transactional data through BDC, and need to build features for multiple predictive models (demand forecasting, customer churn, pricing optimization).

Option A: Leverage SAP’s semantic data products and governance layer directly. SAP BDC provides pre-built semantic models with business logic already encoded. We’d consume these through Snowflake’s data sharing and build features on top.

Option B: Build custom feature engineering pipelines using Snowpark Python UDFs, giving us full control over transformations but requiring us to re-implement business logic and maintain governance ourselves.

I’m leaning toward a hybrid approach but curious what others have experienced. The semantic models are appealing for governance and business alignment, but I’m concerned about flexibility for complex feature engineering. Has anyone successfully combined SAP semantic products with custom Snowpark transformations? What’s worked well in terms of model governance and compliance when you need both pre-built semantics and custom features?

Option B gives you way more flexibility. Snowpark Python UDFs let you implement any feature engineering pattern - complex window functions, statistical transformations, even call external ML libraries. We built our entire feature store this way and haven’t looked back.

That said, don’t underestimate the governance overhead. You’ll need to build your own metadata management, lineage tracking, and access controls. We use a combination of Snowflake tags, custom Python decorators for UDF documentation, and a feature registry in a separate schema to track definitions and ownership.

Version pinning is critical. We create versioned views of SAP semantic products in our Snowflake environment:

CREATE VIEW feature_store.customer_semantic_v1 AS
SELECT * FROM sap_bdc_share.customer_semantic;

This isolates our feature pipelines from upstream changes. When SAP updates their semantics, we create a new versioned view (v2), test it thoroughly with our Snowpark transformations, and then migrate features incrementally. We also maintain a feature compatibility matrix tracking which ML models use which semantic versions.

For model governance, we tag all features with metadata including source semantic version, transformation logic hash, and approval status. This gives us full lineage from SAP source through transformations to deployed models.

From a compliance perspective, the hybrid approach gives you the best audit story. SAP semantic products come with built-in data quality rules and business validation. When regulators ask how you calculated a feature, you can point to SAP’s certified business logic for base attributes, then document your ML transformations separately.

Just make sure you implement proper access controls at both layers. SAP semantics inherit their own RBAC, but your Snowpark feature transformations need separate governance. We use Snowflake’s column-level security and dynamic data masking to ensure sensitive features (PII, financial data) are properly protected even after transformation.

The hybrid approach is definitely the way to go. SAP semantic data products give you pre-validated business logic - things like customer hierarchies, product classifications, and financial calculations that have been audited and approved. Don’t reinvent that wheel.

Use SAP semantics for your base features (customer attributes, product categories, transaction amounts), then layer Snowpark transformations for derived features (RFM scores, propensity calculations, embeddings). This way your models inherit SAP’s governance for core business concepts while you maintain flexibility for ML-specific engineering.

# Example pattern we use
@sproc(name="enrich_customer_features")
def transform(session, semantic_view):
    base = session.table(semantic_view)  # SAP semantic
    # Custom ML features using Snowpark
    return base.with_column("rfm_score",
        calculate_rfm(col("recency"), col("frequency")))

We went through this exact decision last year. Started with Option A (pure semantic models) because governance was our top priority. SAP’s semantic layer handles data lineage, business definitions, and access controls beautifully - critical for regulatory compliance in our industry.

The limitation hit us when we needed time-series features with custom lag calculations and rolling aggregations. SAP semantics are great for standard business metrics but don’t cover advanced ML feature patterns. We ended up with hybrid: consume SAP semantics as base tables, then apply Snowpark transformations for ML-specific features. Best of both worlds.

After synthesizing all this feedback and running some POCs, here’s my consolidated perspective on the semantic vs custom feature engineering decision:

SAP Semantic Data Products and Governance: The semantic layer provides enormous value for foundational business concepts. SAP BDC semantic models come with pre-validated business rules, data quality checks, and audit trails that are critical for regulated industries. For customer hierarchies, product classifications, financial metrics, and supply chain KPIs, leveraging these semantics saves months of validation work and ensures consistency across analytics and ML use cases. The built-in governance (lineage, access controls, change management) is production-grade and compliance-ready.

Snowpark Python UDF Development: Snowpark excels at ML-specific transformations that semantic models don’t cover. Complex window functions, statistical aggregations, time-series feature engineering, and custom business logic all benefit from Snowpark’s flexibility. The Python UDF framework lets you implement any transformation pattern, call external libraries, and even integrate pre-trained models for feature generation. The key is treating Snowpark as the “enrichment layer” rather than replacing semantic foundations entirely.

Feature Engineering Best Practices: Implement a layered architecture:

  • L1 (Raw): SAP BDC shared data, unchanged
  • L2 (Semantic): SAP semantic products with business logic
  • L3 (Base Features): Direct mappings from semantics to ML features
  • L4 (Derived Features): Snowpark transformations creating ML-specific features

This separation makes testing, debugging, and governance much more manageable. Each layer has clear ownership and validation criteria.

Hybrid Semantic + Custom Approach: The hybrid pattern is the practical solution. Use SAP semantics for all business-validated attributes (customer demographics, product attributes, transaction amounts, organizational hierarchies). Build Snowpark transformations for ML-specific features (RFM scores, propensity calculations, embeddings, statistical aggregations, time-series lags).

Implement version pinning for semantic dependencies:

# Snowpark feature pipeline with semantic versioning
from snowflake.snowpark.functions import col, lag, avg

@sproc(name="generate_customer_features_v3")
def build_features(session, semantic_version="v2"):
    # Pin to specific semantic version
    semantic_view = f"sap_share.customer_semantic_{semantic_version}"
    base = session.table(semantic_view)

    # Layer custom ML features on semantic foundation
    features = base.with_columns([
        avg(col("purchase_amount")).over(
            Window.partition_by("customer_id")
            .order_by("transaction_date")
            .rows_between(-6, -1)
        ).alias("avg_purchase_6m"),

        lag(col("last_purchase_date"), 1).over(
            Window.partition_by("customer_id")
            .order_by("transaction_date")
        ).alias("days_since_previous")
    ])

    return features

Model Governance and Compliance: Implement comprehensive metadata management:

  1. Feature Registry: Track every feature with source semantic version, transformation logic, business owner, and approval status
  2. Lineage Tracking: Use Snowflake tags to link features back to SAP semantic sources and Snowpark transformation code
  3. Access Controls: Inherit RBAC from SAP semantics for base attributes, apply additional column-level security for derived features
  4. Validation Framework: Automated tests comparing semantic outputs to expected values, plus data quality checks on Snowpark transformations
  5. Change Management: When SAP updates semantics, create new versioned views, run regression tests on all dependent features, migrate models incrementally

The hybrid approach gives you SAP’s governance benefits (audit trails, validated business logic, regulatory compliance) while maintaining ML flexibility through Snowpark. Document everything, version aggressively, and treat semantic products as immutable inputs to your feature engineering pipeline.

Recommendation: Start with SAP semantics for all available business concepts. Only build custom Snowpark features when semantic products don’t cover your needs. As your ML platform matures, you’ll develop patterns for common transformations that can be templatized and governed as rigorously as the semantic layer itself. The goal is controlled flexibility - innovation where needed, standardization where possible.

Really appreciate these perspectives. The governance argument for SAP semantics is compelling, especially for our finance and supply chain models where audit trails matter. The hybrid pattern makes sense - use semantic products as the “source of truth” layer, then build ML feature transformations on top.

How do you handle versioning in a hybrid setup? When SAP updates their semantic models (new business rules, schema changes), how do you ensure your downstream Snowpark features don’t break? Do you pin to specific versions of the semantic views or build defensive transformation logic?