header_utils
|
Functions and types that operate on Unicode codepoints and strings. More...
Classes | |
struct | ghassanpl::string_ops::text_encoding |
Type that represents a specific text encoding - a combination of ghassanpl::string_ops::text_encoding_type and endianness. More... | |
struct | ghassanpl::string_ops::utf8_view< R > |
A simple view over an UTF8 string range with codepoint values. More... | |
Enumerations | |
enum class | ghassanpl::string_ops::text_encoding_type { unknown , utf8 , utf16 , utf32 , utf7 , utf1 , utf_ebcdic , scsu , bocu1 , gb18030 } |
Specifies a base text-encoding, ignoring endianness for multi-byte encodings. More... | |
enum class | ghassanpl::string_ops::unicode_plane { ghassanpl::string_ops::unicode_plane::invalid , basic_multilingual_plane , supplementary_multilingual_plane , supplementary_ideographic_plane , tertiary_ideographic_plane , supplementary_special_purpose_plane , private_use_plane_a , private_use_plane_b , bmp , smp , sip , tip , ssp , spua_a , pup_a , spua_b , pup_b } |
Represents the Unicode plane. More... | |
Functions | |
constexpr bool | ghassanpl::string_ops::is_high_surrogate (char32_t cp) noexcept |
Returns whether cp is a codepoint that encodes the high part of a codepoint with a more-than-16-bit value. | |
constexpr bool | ghassanpl::string_ops::is_low_surrogate (char32_t cp) noexcept |
Returns whether cp is a codepoint that encodes the low part of a codepoint with a more-than-16-bit value. | |
constexpr bool | ghassanpl::string_ops::is_surrogate (char32_t cp) noexcept |
Returns whether cp is a codepoint that encodes any part of a codepoint with a more-than-16-bit value. | |
constexpr bool | ghassanpl::string_ops::is_unicode (char32_t cp) noexcept |
Returns whether cp has a value that is a valid Unicode codepoint (ie. between 0 and 0x10FFFF). | |
constexpr bool | ghassanpl::string_ops::is_unicode_character (char32_t cp) noexcept |
Returns whether cp has a value that is a valid Unicode character (ie. | |
constexpr char32_t | ghassanpl::string_ops::surrogate_pair_to_codepoint (char32_t high, char32_t low) noexcept |
Returns the codepoint encoded by two surrogates. | |
constexpr auto | ghassanpl::string_ops::get_unicode_plane (char32_t cp) noexcept -> unicode_plane |
text_decode_result | ghassanpl::string_ops::decode_codepoint (bytelike_range auto range, text_encoding encoding) |
Attempts to decode the first codepoint in bytelike range range , assuming it is encoded in encoding . | |
template<bytelike BYTE_TYPE, size_t N> | |
text_encoding | ghassanpl::string_ops::consume_bom (std::span< BYTE_TYPE, N > &spn) |
Consumes (see consume() ) a byte order mark from the beginning of spn (a span of bytelike ), and returns the encoding that the BOM represents (or unknown_text_encoding if no BOM). | |
text_encoding | ghassanpl::string_ops::consume_bom (string_view8 auto &sv) |
Consumes (see consume() ) a byte order mark from the beginning of sv, and returns the encoding that the BOM represents (or unknown_text_encoding if no BOM). | |
text_encoding | ghassanpl::string_ops::consume_bom (string_view16 auto &sv) |
Consumes (see consume() ) a byte order mark from the beginning of sv, and returns the UTF-16 encoding that the BOM represents (or unknown_text_encoding if no BOM). | |
text_encoding | ghassanpl::string_ops::consume_bom (string_view32 auto &sv) |
Consumes (see consume() ) a byte order mark from the beginning of sv, and returns the UTF-32 encoding that the BOM represents (or unknown_text_encoding if no BOM). | |
template<bytelike_range T> | |
text_encoding | ghassanpl::string_ops::detect_encoding (T const &range) |
Attempts to detect the encoding of a given bytelike range. | |
template<typename T > | |
constexpr char32_t | ghassanpl::string_ops::consume_codepoint (T &str) |
Consumes a codepoint from a UTF-encoded string and returns it. | |
template<typename T > | |
constexpr void | ghassanpl::string_ops::append_codepoint (T &str, char32_t cp) |
Appends a codepoint to a UTF-encoded string. Supports UTF-8, UTF-16 and UTF-32, decides based on char type of str . | |
template<typename TO , typename FROM > | |
constexpr void | ghassanpl::string_ops::transcode_unicode (FROM const &from, TO &out) |
Converts a UTF-encoded string to a UTF-encoded string, of a different encoding. Decides the encodings based on the char type of TO and FROM . | |
template<typename TO , typename FROM > | |
constexpr TO | ghassanpl::string_ops::transcode_unicode (FROM const &from) |
Converts a UTF-encoded string to a UTF-encoded string, of a different encoding. Decides the encodings based on the char type of TO and FROM . | |
template<typename T > | |
constexpr void | ghassanpl::string_ops::transcode_codepage_to_unicode (T &dest, stringable8 auto source, std::span< char32_t const, 128 > codepage_map) |
Transcodes an Extended ASCII string source into unicode-encoded dest , according to codepage_map . | |
template<typename RESULT = std::string> | |
constexpr auto | ghassanpl::string_ops::transcode_codepage_to_unicode (stringable8 auto source, std::span< char32_t const, 128 > codepage_map) -> RESULT |
Transcodes an Extended ASCII string source into a unicode encoding, according to codepage_map | |
Encodings | |
Values representing UTF encodings | |
constexpr text_encoding | ghassanpl::string_ops::utf8_encoding |
constexpr text_encoding | ghassanpl::string_ops::utf16_le_encoding |
constexpr text_encoding | ghassanpl::string_ops::utf16_be_encoding |
constexpr text_encoding | ghassanpl::string_ops::utf32_le_encoding |
constexpr text_encoding | ghassanpl::string_ops::utf32_be_encoding |
constexpr text_encoding | ghassanpl::string_ops::unknown_text_encoding |
Represents an unknown text encoding (e.g. when an encoding could not be determined) | |
UTF-8 functions | |
constexpr size_t | ghassanpl::string_ops::codepoint_utf8_count (char32_t cp) noexcept |
Returns the number of UTF-8 octets necessarity to encode the given codepoint. | |
constexpr char32_t | ghassanpl::string_ops::consume_utf8 (string_view8 auto &str) |
Consumes (see consume() ) a UTF-8 codepoint from str . | |
constexpr size_t | ghassanpl::string_ops::count_utf8_codepoints (stringable8 auto str) |
Returns the number of codepoints in the given UTF-8 string str | |
constexpr size_t | ghassanpl::string_ops::append_utf8 (string8 auto &buffer, char32_t cp) |
Appends octets to buffer by encoding cp into UTF-8. | |
template<string8 RESULT = std::string> | |
constexpr RESULT | ghassanpl::string_ops::to_utf8 (char32_t cp) |
Returns cp encoded as a UTF-8 string. | |
template<string8 RESULT = std::string, stringable16 STR> | |
constexpr RESULT | ghassanpl::string_ops::to_utf8 (STR &&str) |
Returns str (a UTF-16-encoded string) encoded as a UTF-8 string. | |
std::string | ghassanpl::string_ops::to_string (std::wstring_view str) |
Returns str (a UTF-16-encoded string) encoded as a UTF-8 string. | |
constexpr void | ghassanpl::string_ops::transcode_codepage_to_utf8 (string8 auto &dest, stringable8 auto source, std::span< char32_t const, 128 > codepage_map) |
Transcodes an Extended ASCII string source into UTF-8 dest , according to codepage_map | |
template<string8 RESULT = std::string> | |
constexpr auto | ghassanpl::string_ops::transcode_codepage_to_utf8 (stringable8 auto source, std::span< char32_t const, 128 > codepage_map) -> RESULT |
Transcodes an Extended ASCII string source into UTF-8, according to codepage_map | |
UTF-16 functions | |
constexpr char32_t | ghassanpl::string_ops::consume_utf16 (string_view16 auto &str) |
Consumes (see consume() ) a UTF-16 codepoint from str . | |
constexpr size_t | ghassanpl::string_ops::append_utf16 (string16 auto &buffer, char32_t cp) |
Appends 16-bit values to buffer by encoding cp into UTF-16. | |
template<string16 RESULT = std::wstring> | |
constexpr RESULT | ghassanpl::string_ops::to_utf16 (char32_t cp) |
Returns cp encoded as a UTF-16 string. | |
template<string16 RESULT = std::wstring, stringable8 STR> | |
constexpr RESULT | ghassanpl::string_ops::to_utf16 (STR str) |
Returns str (a UTF-8-encoded string) encoded as a UTF-16 string. | |
std::wstring | ghassanpl::string_ops::to_wstring (std::string_view str) |
Returns str (a UTF-8-encoded string) encoded as a UTF-16/32 string in a std::wstring (depending on the size of wchar_t) | |
UTF-32 functions | |
constexpr char32_t | ghassanpl::string_ops::consume_utf32 (string_view32 auto &str) |
Consumes (see consume() ) a UTF-32 codepoint from str . | |
constexpr size_t | ghassanpl::string_ops::append_utf32 (string32 auto &buffer, char32_t cp) |
Appends 32-bit values to buffer by encoding cp into UTF-32. | |
Functions and types that operate on Unicode codepoints and strings.
This code uses char32_t
to represent single Unicode codepoints (as UTF-32 code units).
struct ghassanpl::string_ops::text_encoding |
Type that represents a specific text encoding - a combination of ghassanpl::string_ops::text_encoding_type
and endianness.
Class Members | ||
---|---|---|
endian | endianness | |
text_encoding_type | type |
Represents the Unicode plane.
Value equals the actual number of the unicode plane
Enumerator | |
---|---|
invalid | Represents an invalid plane number. |
|
inline |
Consumes (see consume()
) a byte order mark from the beginning of spn (a span
of bytelike
), and returns the encoding that the BOM represents (or unknown_text_encoding
if no BOM).
|
inline |
Consumes (see consume()
) a byte order mark from the beginning of sv, and returns the UTF-16 encoding that the BOM represents (or unknown_text_encoding
if no BOM).
|
inline |
Consumes (see consume()
) a byte order mark from the beginning of sv, and returns the UTF-32 encoding that the BOM represents (or unknown_text_encoding
if no BOM).
|
inline |
Consumes (see consume()
) a byte order mark from the beginning of sv, and returns the encoding that the BOM represents (or unknown_text_encoding
if no BOM).
|
inline |
|
inline |
|
constexprnoexcept |
|
inline |
|
constexpr |
Returns str
(a UTF-16-encoded string) encoded as a UTF-8 string.
Returns str
(a UTF-32-encoded string) encoded as a UTF-8 string.
RESULT | the type of string to return (std::string by default) |
str
must be valid UTF-16RESULT | the type of string to return (std::string by default) |
str
must be valid UTF-32str
must be valid UTF-16
|
inline |
|
constexpr |
Transcodes an Extended ASCII string source
into a unicode encoding, according to codepage_map
RESULT | the type of string to return (std::string by default) |
codepage_map | A span of 128 Unicode codepoints that will be substituted for EASCII values 128-255 |
RESULT
|
constexpr |
Transcodes an Extended ASCII string source
into unicode-encoded dest
, according to codepage_map
.
Destination encoding will be decided based on the char type of dest
.
codepage_map | A span of 128 Unicode codepoints that will be substituted for EASCII values 128-255 |
|
constexpr |
Transcodes an Extended ASCII string source
into UTF-8 dest
, according to codepage_map
codepage_map | A span of 128 Unicode codepoints that will be substituted for EASCII values 128-255 TODO: Is this needed since we have transcode_codepage_to_unicode ? |
|
constexpr |
Transcodes an Extended ASCII string source
into UTF-8, according to codepage_map
RESULT | the type of string to return (std::string by default) |
codepage_map | A span of 128 Unicode codepoints that will be substituted for EASCII values 128-255 |
transcode_codepage_to_unicode
?
|
inlineconstexpr |
|
inlineconstexpr |
|
inlineconstexpr |
|
inlineconstexpr |
|
inlineconstexpr |
|
inlineconstexpr |