Athena query fails on ERP logs due to missing partition metadata after Glue ETL job

jeffreyexpert · January 27, 2025, 11:37am

We’re running Athena queries against our ERP application logs stored in S3, partitioned by date. After our nightly Glue ETL job completes, Athena consistently returns ‘HIVE_PARTITION_SCHEMA_MISMATCH’ errors when querying recent data.

Our Glue crawler runs daily at 2 AM to update the Data Catalog, but queries against yesterday’s partition fail until we manually run MSCK REPAIR TABLE. The ETL workflow writes new partitions in format year=2025/month=03/day=14/ but the catalog doesn’t reflect them.

Sample failing query:

SELECT user_id, action, timestamp
FROM erp_logs
WHERE year=2025 AND month=3 AND day=13;

Error: Partition not found in Data Catalog. This blocks our morning analytics reports. How can we ensure the Glue Data Catalog updates automatically after ETL jobs complete, and integrate partition repair into our workflow without manual intervention?

ryanadmin · January 31, 2025, 9:16pm

The HIVE_PARTITION_SCHEMA_MISMATCH error suggests your partition columns might have type inconsistencies. When Glue ETL writes new partitions, are the data types matching the existing catalog schema? I’d recommend checking if your month and day columns are being written as integers versus strings. Run SHOW PARTITIONS erp_logs to see what’s actually registered versus what exists in S3.

donna_lead · February 14, 2025, 2:00am

For automatic partition updates, consider using AWS Lambda triggered by S3 PUT events. When your ETL writes new partition data, Lambda can execute ALTER TABLE ADD PARTITION statements directly. This eliminates the crawler delay entirely and gives you real-time partition availability. You’d need to parse the S3 key to extract partition values and construct the DDL, but it’s more reliable than scheduled crawlers for time-sensitive analytics.

brandonsolver · February 16, 2025, 6:38am

Lambda works but adds complexity. A simpler approach: configure your Glue ETL job to use enableUpdateCatalog and partitionKeys parameters. This makes Glue automatically update the Data Catalog as it writes partitions. In your job script, set these DynamicFrame write options and Glue handles catalog updates synchronously. Much cleaner than post-processing with crawlers or Lambda functions, and it’s native Glue functionality designed exactly for this scenario.

Topic		Views
Athena query fails on partitioned Glue table for financial reporting Amazon Web Services (AWS) question , compute , analytics , aws-2021 , schema-mismatch , reporting-blocked , partitioning , athena , glue	4	April 6, 2025
Athena query fails to read Parquet files from S3 with schema mismatch error Amazon Web Services (AWS) question , analytics , sql , data-lake , aws-2019 , s3 , athena , parquet , glue-data-catalog	6	January 13, 2025
Athena query fails on S3 CSV data due to missing column mapping and inconsistent schema Amazon Web Services (AWS) question , analytics , devops-auto , csv , aws-2019 , schema-mismatch , s3 , athena , glue	5	December 6, 2025
Athena query API returns timeout error when processing large datasets for monthly reports Amazon Web Services (AWS) question , analytics , timeout , database , sql , rest-api , aws-2019 , pagination , apis	7	October 6, 2025
Athena query execution fails with access denied error due to missing Glue permissions Amazon Web Services (AWS) question , analytics , aws-2019 , json , access-denied , compliance-reporting , athena , iam-policy , glue	6	June 1, 2025
Glue crawler fails to catalog Parquet files after S3 bucket migration for analytics data lake Amazon Web Services (AWS) question , storage , analytics , aws-2019 , s3 , kms , glue , parquet , crawler-fails	6	April 1, 2025
Redshift Serverless Zero-ETL integration from SAP OData creates unoptimized partition keys Amazon Web Services (AWS) question , analytics , aws-2019 , query-performance , data-warehousin , aws-glue , redshift-serverless , partition-key , sap-odata	6	April 17, 2025
Automated Glue job for incremental S3 to Redshift ETL in ERP finance module reduced processing time by 60% Amazon Web Services (AWS) use-case , storage , analytics , sql , devops-auto , etl-pipeline , aws-2020 , python , s3	5	January 10, 2025
Athena queries fail on S3 logs generated by ECS FireLens due to JSON parsing errors Amazon Web Services (AWS) question , compute , analytics , aws-2021 , json , s3 , ecs , athena , firelens	3	June 11, 2025

Athena query fails on ERP logs due to missing partition metadata after Glue ETL job

Related topics