Here, $\Phi^{-1}(3/4) = \Phi^{-1}(0.75) \approx 0.6745$ reflects the fact
that 75% of values lie within $\approx 0.6745$ standard deviation, 4.5 represent a fixed threshold value and 8 represent the minimum absolute deviation that a value
must have from the median to be considered an outlier.
These constants have been fine-tuned to work well with the weather data of
a wide range of climates and to ignore daily temperature fluctuations while
still being able to detect significant anomalies.
Daily temperatures collected over a short time window(1/2 months, but not less than a few days)
// *should* be normally distributed. This algorithm only work under this assumption.
> [!IMPORTANT]
> The anomaly detection algorithm works under the assumption that the weather data
> is normally distributed(at least roughly), this might not be the case on datasets
> with a very small number of samples(e.g. few days of data) or with a large
> number of samples(e.g. multi-seasonal data).
My statistical benchmarks(QQ plots) show that the algorithm works
quite well when these conditions are met, and even with real world data,
the results were quite satisfactory. However, if it
start to produce false positives, you will need to dump the whole in-memory
database and start from scratch. I recommend to do this at every change of season.
## Embedded Cache System
To minimize the amount of requests sent to the OpenWeatherMap API, Zephyr provides an built-in,
in-memory cache data structure that stores fetched weather data. Each time a client requests
weather data for a given location, the service will first check if it's already available on the cache.
If it is found, the cached value will be returned, otherwise a new request will be sent to the OpenWeatherMap API
and the response will be returned to the client and stored in the cache for future use. Each cache entry
is valid for a fixed amount of time, which can be configured by setting the `ZEPHYR_CACHE_TTL` environment variable. Once
a cached entry expires, Zephyr will retrieve a new value from the OpenWeatherMap API and update the cache accordingly.
The cache system significantly improves the performance of the service by decreasing its latency. Additionally, it
also helps to reduce the number of API calls made to the OpenWeatherMap servers, which is quite important
if you are using their free tier.
## Configuration
Zephyr requires the following environment variables to be set:
This software is released under the GPLv3 license. You can find a copy of the license with this repository or by visiting the [following page](https://choosealicense.com/licenses/gpl-3.0/).