TbUtf8

From Free Pascal wiki
Jump to navigationJump to search

Deutsch (de) English (en)

About

TbUtf8 is a library to easily change UTF-8 encoded strings.

Problem

With Lazarus (Free Pascal) the string UTF8 encoded. However, the "String" type is nothing more than a dynamic byte array. Length returns the number of bytes in the array but not the number of characters. With UTF8, a character can be 4 bytes long and even 7 bytes with combined characters. An example should illustrate this. 'Thomas' 6 characters, 6 bytes in size. 'Thömäs' 6 characters, 8 bytes in size.

Solution

With TbUtf8 you can now easily change and search UTF8 strings with special and combined characters, such as "üäößẶặǺǻǼǽǞǟǍǎḂḃÞþÇçĆćĊċ...". Essentially, the library consists of a UTF8 string class (TIbUtf8).

Benefits

  • TIbUtf8 is a class type of TInterfacedObject and does not need to be cleaned up with free.
  • All indexes are character based.
  • All returned characters are of type String.
  • Returns the number of characters in the string.
  • Returns the number of bytes in the string.
  • Delete characters or character groups.
  • Insertion of characters and character groups.
  • Appending characters and character groups.
  • Reading / writing of characters and character groups.
  • Read from file / write to a file.
  • Read from stream / write to a stream.

Disadvantage

  • Since UTF8 does not have a constant offset from character to character, searching for characters is much more complex. Iterating over the characters is about 20 times slower than with the string. (Comfort has its price)
  • Slightly more memory is required.


Example

var
  u: IbUtf8;
  i: Integer;
begin
  u:= TIbUtf8.Create('Thömäß');
  for i:= 1 to u.NumberOfChars do begin
    case u.Chars[i] of
      'ö': u.Chars[i]:= 'o';
      'ä': u.Chars[i]:= 'a';
      'ß': u.Chars[i]:= 's';
    end;
  end;
  if u.Text = 'Thomas' then begin
    WriteLn('That''s right!');
  end;
end.

Download

git clone https://gitlab.com/FpTuxe/tbutf8.git

Installation

Variant 1
Start Lazarus and open your project.
Lazarus->File->Open your workspace/tbutf8/src/tb_utf8.pas
Lazarus->Project->Add Editor File to Project
Variant 2
Start Lazarus and open your project.
Lazarus->Package->Open Package File (.lpk) your workspace/tbutf8/src/tbutf8.lpk
Now, click Use->Add to Project
Close then Package window.

Functional Description

The functional description, you can found under the project folder "doc/".