UTF8 string class/en
│
English (en) │
What is TbUtf8?
With the library TbUtf8 you can easily change UTF8 Strings.
Problem
With Lazarus (Free Pascal) the string UTF8 encoded. However, the "String" type is nothing more than a dynamic byte array. Length returns the number of bytes in the array but not the number of characters. With UTF8, a character can be 4 bytes long and even 7 bytes with combined characters. An example should illustrate this. 'Thomas' 6 characters, 6 bytes in size. 'Thömäs' 6 characters, 8 bytes in size.
Solution
With TbUtf8 you can now easily change and search UTF8 strings with special and combined characters, such as "üäößẶặǺǻǼǽǞǟǍǎḂḃÞþÇçĆćĊċ...". Essentially, the library consists of a UTF8 string class (TIbUtf8).
Benefits
- TIbUtf8 is a class type of TInterfacedObject and does not need to be cleaned up with free.
- All indexes are character based.
- All returned characters are of type String.
- Returns the number of characters in the string.
- Returns the number of bytes in the string.
- Delete characters or character groups.
- Insertion of characters and character groups.
- Appending characters and character groups.
- Reading / writing of characters and character groups.
- Read from file / write to a file.
- Read from stream / write to a stream.
Disadvantage
- Since UTF8 does not have a constant offset from character to character, searching for characters is much more complex. Iterating over the characters is about 20 times slower than with the string. (Comfort has its price)
- Slightly more memory is required.
Example
proceudre Demo01: Boolean;
var
u: IbUtf8;
i: Integer;
begin
u:= TIbUtf8.Create('Thömäß');
for i:= 1 to u.NumberOfChars do begin
case u.Chars[i] of
'ö': u.Chars[i]:= 'o';
'ä': u.Chars[i]:= 'a';
'ß': u.Chars[i]:= 's';
end;
end;
if u.Text = 'Thomas' then begin
WriteLn('That's right!');
end;
end;
DownLoad
- GitLab FpTuxe/TbUtf8 repository
FpTuxe/TbUtf8
- Git clone
git clone https://gitlab.com/FpTuxe/tbutf8.git
Installation
- Variant 1
- Start Lazarus and open your project.
- Lazarus->File->Open your workspace/tbutf8/src/tb_utf8.pas
- Lazarus->Project->Add Editor File to Project
- Variant 2
- Start Lazarus and open your project.
- Lazarus->Package->Open Package File (.lpk) your workspace/tbutf8/src/tbutf8.lpk
- Now, click Use->Add to Project
- Close then Package window.
Functional Description
The functional description, you can found under the project folder "doc/".