Good Morning Everyone,
While planning work, the CPE team has realized that a number of our initiatives actually start with a research phase to find the most appropriate technical solution. This leads to some issues with planning as without knowing the technical solution we want to take, it's hard to evaluate the amount of work needed and thus the time it'll take to do it.
In order to help with this, we're creating a small sub-team in CPE, called the ARC team for Advance Reconaissance Crew*. The goal of this team will be to investigate what we believe to be the possible technical solutions for initiatives and advise the team on what they believe would be the appropriate solution. To this end, we will reach out when we start looking for ideas as you may have ideas that we did not think about.
The first investigation, led by Will Woods, Mark O'Brien and I, will be around datanommer and datagrepper.
datanommer is an application listening to fedmsg and filling a (postgresql) database with all the messages passing on the bus. datagrepper is a web application exposing these messages and offering a way to filter or search them. available at: https://apps.fedoraproject.org/datagrepper/
Currently our ideas are: - for datanommer: - port it to fedora-messaging - adjust it to whichever solution we chose to replace datagrepper
- for datagrepper: - keep it as is - Replace by - postgres https://postgrest.org/ - prest https://github.com/prest/prest - kinto https://docs.kinto-storage.org/en/stable/ - Swagger/OpenAPI https://swagger.io/ - Add support for Graphql
- for the postgresql server - Split messages per year in different table - Unite them using a postgresql view - Kick out the old messages per year - Keep the current year + n-1 in the current DB - Kick the other to another DB? - Kick the other to a tarball somewhere? - Output the database daily dump to file / year - TimescaleDB a postgresql plugin for time-series data - https://alibaba-cloud.medium.com/postgresql-time-series-database-plug-in-tim... - https://dev.t-matix.com/blog/postgresql-as-a-time-series-database/ - https://docs.timescale.com/latest/introduction - Make the msg field in the message table be a JSON field
Would you have any other ideas of things we could look at?
Looking forward for your input,
Thanks, Pierre, Will and Mark
* Our notes and documentation are hosted at: https://fedora-arc.readthedocs.io/en/latest/index.html