UTF8 string class/en

From Lazarus wiki
Jump to navigationJump to search

English (en)

What is TbUtf8?

With the library TbUtf8 you can easily change UTF8 Strings.

Problem

With Lazarus (Free Pascal) the string UTF8 encoded. However, the "String" type is nothing more than a dynamic byte array. Length returns the number of bytes in the array but not the number of characters. With UTF8, a character can be 4 bytes long and even 7 bytes with combined characters. An example should illustrate this. 'Thomas' 6 characters, 6 bytes in size. 'Thömäs' 6 characters, 8 bytes in size.

Solution

With TbUtf8 you can now easily change and search UTF8 strings with special and combined characters, such as "üäößẶặǺǻǼǽǞǟǍǎḂḃÞþÇçĆćĊċ...". Essentially, the library consists of a UTF8 string class (TIbUtf8).

Benefits

  • TIbUtf8 is a class type of TInterfacedObject and does not need to be cleaned up with free.
  • All indexes are character based.
  • All returned characters are of type String.
  • Returns the number of characters in the string.
  • Returns the number of bytes in the string.
  • Delete characters or character groups.
  • Insertion of characters and character groups.
  • Appending characters and character groups.
  • Reading / writing of characters and character groups.
  • Read from file / write to a file.
  • Read from stream / write to a stream.

Disadvantage

  • Since UTF8 does not have a constant offset from character to character, searching for characters is much more complex. Iterating over the characters is about 20 times slower than with the string. (Comfort has its price)
  • Slightly more memory is required.


Example

proceudre Demo01: Boolean;
var
  u: IbUtf8;
  i: Integer;
begin
  u:= TIbUtf8.Create('Thömäß');
  for i:= 1 to u.NumberOfChars do begin
    case u.Chars[i] of
      'ö': u.Chars[i]:= 'o';
      'ä': u.Chars[i]:= 'a';
      'ß': u.Chars[i]:= 's';
    end;
  end;
  if u.Text = 'Thomas' then begin
    WriteLn('That's right!');
  end;
end;

DownLoad

  • GitLab FpTuxe/TbUtf8 repository
FpTuxe/TbUtf8
  • Git clone
git clone https://gitlab.com/FpTuxe/tbutf8.git

Installation

Variant 1
Start Lazarus and open your project.
Lazarus->File->Open your workspace/tbutf8/src/tb_utf8.pas
Lazarus->Project->Add Editor File to Project
Variant 2
Start Lazarus and open your project.
Lazarus->Package->Open Package File (.lpk) your workspace/tbutf8/src/tbutf8.lpk
Now, click Use->Add to Project
Close then Package window.

Functional Description

The functional description, you can found under the project folder "doc/".