About

Collection of generic algorithms and data structures.

Requires: FPC 3.2+, Lazarus 1.9+.

Author: A.Koverdyaev (avk)

License: Apache License 2.0

Features

Implemented primitives

stack (unit lgStack)
queue (unit lgQueue)
deque (unit lgDeque)
vector (unit lgVector)
vector of bits (unit lgVector)
priority queue based on binary heap (unit lgPriorityQueue)
priority queue with key update and melding based on pairing heap (unit lgPriorityQueue)
sorted list (unit lgList)
hashed list - array based list with the ability to fast search by key (unit lgList)
hashset (unit lgHashSet)
fine-grained concurrent hashset (unit lgHashSet)
sorted set (unit lgTreeSet)
set of arbitrary size (unit lgUtil, TGSet)
hash multiset (unit lgHashMultiSet)
fine-grained concurrent hashmultiset (unit lgHashMultiSet)
sorted multiset (unit lgTreeMultiSet)
hashmap (unit lgHashMap)
fine-grained concurrent hashmap (unit lgHashMap)
sorted map (unit lgTreeMap)
hash multimap (unit lgMultiMap)
tree multimap (unit lgMultiMap)
list miltimap (unit lgMultiMap)
bijective map (unit lgBiMap)
sparse 2D table (unit lgTable2D)
disjoint set (unit lgHashSet)
AVL tree (unit lgAvlTree)
red-black tree (unit lgRbTree)
some treap variants (unit lgTreap)
general rooted tree (unit lgRootTree)
sparse labeled undirected graph (unit lgSimpleGraph)
sparse labeled directed graph (unit lgSimpleDigraph)
lite containers based on advanced records
extended IEnumearble interface - filtering, mapping, etc.

Implemented graph features

core functions:
- vertices/edges addition/removal/query/enumeration, edge contraction, degree
- load/save to own binary format, primitive export to DOT format
connectivity:
- connected/strongly connected components, bipartite detection, degeneracy, k-core
- articulation points, bridges, biconnected components
- edge-connectivity
traversals:
- BFS/DFS traversals with visitors,
- cycle/negative cycle detection,
- topological sort
operations:
- induced subgraphs, complement, reverse, union, intersect, symmetric difference,
chordality testing
planarity testing: FMR Left-Right Planarity algorithm
distance within graph:
- eccentricity, radius, diameter, center, periphery
matching:
- maximum cardinality matching on bipartite/arbitrary graphs
- minimum/maximum weight matching on bipartite graphs
dominators in flowgraps: simple iterative and Semi-NCA algorithms
some suggestions for NP-hard problems:
- maximum independent set, maximal independent sets enumeration
- maximum clique, cliques enumeration
- minimum vertex cover, minimal vertex covers enumeration
- vertex coloring, approximations and exact
- minimum dominating set
- Hamiltonian cycles and paths
- local search TSP approximations, BnB TSP solver
minimum spanning trees: Prims's and Kruskal's algorithms
single source shortest paths:
- Dijkstra with pairing heap, A*, Bellman-Ford-Moor with Tarjan's subtree disassembly(BFMT)
single pair shortest paths:
- Dijkstra with binary heap, bidirection Dijkstra, A*, NBA*
all pairs shortest paths:
- Floyd–Warshall, Johnson, BFMT
networks:
- maximum flow: push/relabel, capacity scaling Dinitz
- minimum-cost flow: Busacker-Gowen, cost scaling push/relabel algorithm
- global minimum cut: Stoer–Wagner, Nagamochi-Ibaraki

Algorithms on arrays and vectors

(mostly unit lgArrayHelpers)

reverse, right/left cyclic shifts
permutations
binary search
N-th order statistics
inversion counting
distinct values selection
quicksort
introsort
dual pivot quicksort
mergesort
timsort (unit lgMiscUtils)
counting sort
radix sort
translation of Orson Peters' PDQSort algorithm
static segment tree
longest increasing subsequence
...

Algorithms on strings

exact string matching
- Boyer-Moore string matching algorithm(in Fast Search variant), case sensitive and case insensitive(unit lgStrHelpers)
- Boyer-Moore-Horspool-Raita algorithm(unit lgStrHelpers)
longest common subsequence of two sequences:
- reducing the LCS problem to LIS
- Kumar-Rangan algorithm for LCS
- Myers algorithm for LCS
the Levenshtein distance:
- simple DP algorithm
- modified Berghel-Roach algorithm
- Myers bit-vector algorithm with cut-off heuristic
LCS distance:
- Myers algorithm for LCS distance
fuzzy string matching(k differences)
- Ukkonen EDP algorithm
fuzzy string matching with preprocessing(something similar to fuzzywuzzy)

Other

non-cryptogarphic hashes (unit lgHash):
- Yann Collet's xxHash32, xxHash64
- Austin Appleby's MurmurHash2, MurmurHash2A, MurmurHash3_x86_32, MurmurHash64A
brief and dirty implementation of futures concept (unit lgAsync)
brief channel implementation (unit lgAsync)
brief implementation of thread pool (unit lgAsync)
128-bit integers (unit lgInt128)
JSON validator/parser/generator(unit lgJson)
Eisel-Lemire fast string-to-double conversion algorithm(unit lgJson)
Ryū double-to-string conversion algorithm(unit lgJson)

Perfomance

In 2022/03, a number of functions have been added that implement some algorithms on strings and sequences, regarding metrics and fuzzy matching, the list is annotated in the README of the project. The example/fuzz_bench folder contains several benchmarks for these functions.

It was curious to compare the performance of the SimRatioLevEx() function (which is inspired by FuzzyWuzzy) with some incarnations of the FuzzyWuzzy (listed here) on benchmark datasets. Disclamer: SimRatioLevEx() does not reproduce FuzzyWuzzy, but it does some things in a similar way, in particular, SimRatioLevEx() in smTokenSetEx mode and token_set_ratio() do roughly the same job.

It seems the C++ version only supports single-byte strings, so only compared to the single-byte version of SimRatioLevEx():

       Dataset            SimRatioLevEx() token_set_ratio()
     
    Short/big_dist             1154              6440
    Short/small_dist            967              3020
    Medium/big_dist             811              3450
    Medium/small_dist           702              1470
    Long/big_dist              1966             15000
    Long/small_dist            1061              2250

The numbers indicate the run time in milliseconds; the C++ version was compiled with gcc-8.1.0 with options -O3 -m64; the Pascal version was compiled with FPC-3.3.1-9941-g8e6e9bbf33, -O3. The benchmark was run on a Windows x64 machine.

The Go version, on the contrary, always works with Unicode strings:

       Dataset          SimRatioLevExUtf8() TokenSetRatio()
     
    Short/big_dist             2143             18705
    Short/small_dist           1593              2224
    Medium/big_dist            1266             15062
    Medium/small_dist           853              1742
    Long/big_dist              3853             79851
    Long/small_dist            1269              3126

Go version: go1.10.4 linux/amd64; FPC-3.3.1-10683-g2a19e152b7 -O3. The benchmark was run on a virtual Linux machine.

Download

GitHub: https://github.com/avk959/LGenerics

LGenerics

Contents

About

Features

Implemented primitives

Implemented graph features

Algorithms on arrays and vectors

Algorithms on strings

Other

Perfomance

Download

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Tools

Search