Files
datum/docs/map.md

76 lines
3.1 KiB
Markdown
Raw Permalink Normal View History

2025-11-10 10:49:23 +01:00
# Map Technical Details
In this document you can find a quick overview of the technical
aspects (internal design, memory layout, etc.) of the `Map` data structure.
`Map` is an hash table that uses open addressing with linear probing for collision
resolution and the [FNV-1a algorithm](https://en.wikipedia.org/wiki/FowlerNollVo_hash_function) as its hashing function. Resizing is performed
automatically by doubling the capacity when the load factor exceeds 75%. Internally,
2026-01-12 11:58:32 +01:00
this data structure is represented by the following two layouts:
2025-11-10 10:49:23 +01:00
```c
typedef struct {
char *key;
void *value;
element_state_t state;
} map_element_t;
typedef struct {
map_element_t *elements;
size_t capacity;
size_t size;
size_t tombstone_count;
} map_t;
```
where the `key` variable represent a string used to index the `value`. The `state`, instead, indicates whether the entry is empty, occupied or deleted and is primarily used
by the garbage collector for internal memory management. An array of `map_element_t`,
with the variables indicating the *capacity*, the *current size* and
the *tombstone count* (that is, the number of delete entries), form a `map_t` data type.
The keys are **copied** by the hashmap; this means that it **owns** them and is therefore
responsible for managing their memory. Values, on the other hand,
**are stored as pointers**. This means that the hashmap **does NOT own them** and that
the caller is responsible for managing their memory; this includes: allocate
enough memory for them, ensure that the pointers remain valid for their whole lifecycle
on the map, delete old values when updating a key and, if the values were heap-allocated,
free them before removing the keys or destroying the map.
The `Map` data structure supports the following methods:
- `map_result_t map_new()`: initializes a new map;
- `map_result_t map_add(map, key, value)`: adds a `(key, value)` pair to the map;
- `map_result_t map_get(map, key)`: retrieves a values indexed by `key` if it exists;
- `map_result_t map_remove(map, key)`: removes a key from the map if it exists;
- `map_result_t map_clear(map)`: resets the map state;
- `map_result_t map_destroy(map)`: deletes the map;
2025-11-10 10:49:23 +01:00
- `size_t map_size(map)`: returns map size (i.e., the number of elements);
- `size_t map_capacity(map)`: returns map capacity (i.e., map total size).
2026-01-12 11:58:32 +01:00
As you can see from the previous function signatures, most methods that operate
2025-11-10 10:49:23 +01:00
on the `Map` data type return a custom type called `map_result_t` which is
defined as follows:
```c
typedef enum {
MAP_OK = 0x0,
MAP_ERR_ALLOCATE,
MAP_ERR_OVERFLOW,
2025-11-10 10:49:23 +01:00
MAP_ERR_INVALID,
MAP_ERR_NOT_FOUND
} map_status_t;
typedef struct {
map_status_t status;
uint8_t message[RESULT_MSG_SIZE];
union {
map_t *map;
void *element;
} value;
} map_result_t;
```
Each method that returns such type indicates whether the operation was successful or not by setting
the `status` field and by providing a descriptive message on the `message` field. If the operation was
successful (that is, `status == MAP_OK`), you can either move on with the rest of the program or read
2026-03-16 09:38:38 +01:00
the returned value from the sum data type.