[SR-9432] Stop using ICU for normalization #51896

milseman · 2018-12-07T19:32:09Z


Previous ID	SR-9432
Radar	rdar://problem/51635207
Original Reporter	@milseman
Type	Sub-task
Status	Resolved
Resolution	Done

Additional Detail from JIRA


Votes	4
Component/s	Standard Library
Labels	Sub-task
Assignee	@Azoy
Priority	Medium

md5: 2e7cb865a995734ac55ca28ea9e21299

Parent-Task:

SR-10535 [stdlib] Drop ICU dependency

Issue Description:

We use ICU heavily for normalization, and doing so efficiently is a source of considerable stdlib complexity (more complexity than just implementing the algorithm). If we have efficient access to the data tables, we should just implement this ourselves.

We heavily check NFC_QC=yes and hasCompBoundaryBefore in our fast-paths. Bouncing over to ICU gives us a hefty perf cost compared to checking locally. A local Unicode.Scalar trie-like structure that can answer these queries efficiently would alleviate this.

Using ICU for normalization involves transcoding UTF-8 to UTF-16 and back. This is costly and another source of complexity. E.g., we need many growable buffers of different widths, and even more conservative growth reservation factors.

We'd like fast-paths for languages with combining characters. Scalar-based queries only fast-path single-scalar segments, and ICU's implementation of the multi-scalar QC algorithm is UTF-16.

Azoy · 2021-11-18T20:53:17Z

This has been resolved here: #38922

swift-ci transferred this issue from apple/swift-issues Apr 25, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SR-9432] Stop using ICU for normalization #51896

[SR-9432] Stop using ICU for normalization #51896

milseman mannequin commented Dec 7, 2018

Azoy commented Nov 18, 2021

[SR-9432] Stop using ICU for normalization #51896

[SR-9432] Stop using ICU for normalization #51896

Comments

milseman mannequin commented Dec 7, 2018

Azoy commented Nov 18, 2021