Here’s the complete solution addressing all four focus areas:
1. UTF-8 Character Encoding Configuration:
The root issue is your database collation, but you can fix this without changing it. Configure your API client to explicitly handle UTF-8:
HttpHeaders headers = new HttpHeaders();
headers.setAccept(Arrays.asList(MediaType.APPLICATION_JSON));
headers.setAcceptCharset(Arrays.asList(StandardCharsets.UTF_8));
Also add this to your ETQ system configuration (Admin > System Settings > API Configuration):
api.export.defaultCharset=UTF-8
api.export.forceCharsetConversion=true
2. JSON Export API Headers:
Modify your API calls to include explicit charset handling. When making requests:
GET /api/documents/export
Accept: application/json; charset=utf-8
Accept-Charset: utf-8
And configure your JSON parser to handle UTF-8:
ObjectMapper mapper = new ObjectMapper();
mapper.configure(JsonParser.Feature.ALLOW_UTF8_BOM, true);
3. Character Normalization in Document Metadata:
ETQ 2022 supports Unicode normalization in export transformations. Enable this in Admin > Document Control > Export Settings:
- Set ‘Unicode Normalization Form’ to ‘NFC’ (Canonical Composition)
- Enable ‘Character Entity Encoding’ for special symbols
- Check ‘Validate UTF-8 Sequences’ to detect corruption early
You can also apply normalization rules via the API by adding a transform parameter:
GET /api/documents/export?transform=unicode_nfc&validateEncoding=true
4. Schema Validation for International Content:
Implement validation rules for document metadata to ensure consistent encoding from the start. Create a custom validation rule in ETQ:
- Navigate to Admin > Document Control > Validation Rules
- Add rule: ‘Validate Character Encoding’
- Set validation script to check for valid UTF-8 sequences
- Apply to all document fields containing international characters
For existing documents with corrupted characters, you’ll need to run a cleanup script. ETQ provides a data normalization utility:
-- Backup first!
UPDATE DOCUMENT_METADATA
SET title = CONVERT(NVARCHAR(MAX), title COLLATE Latin1_General_100_CI_AS_SC_UTF8),
description = CONVERT(NVARCHAR(MAX), description COLLATE Latin1_General_100_CI_AS_SC_UTF8)
WHERE title LIKE '%�%' OR description LIKE '%�%'
Long-term Solution:
While the above fixes your immediate export issues, plan a database migration to a UTF-8 compatible collation (Latin1_General_100_CI_AS_SC_UTF8 for SQL Server 2019+). This is the proper fix but requires:
- Full database backup
- Maintenance window (2-4 hours for typical document repositories)
- Testing of all integrations post-migration
- ETQ support involvement to ensure compatibility
For now, the API-level fixes and normalization rules will resolve your export corruption. Test thoroughly with documents containing ™, ®, accented characters, and CJK characters to verify all edge cases are handled correctly.