Version Your Database / Future Directions

Hi all,

as I want to release version 1.0.0 of SirixDB[1] soon, but lack an Open Source community sadly I wanted to discuss here what you think is most important for future directions.

To keep it short SirixDB keeps the history of each resource in a database through a huge index-trie structure completely copy-on-write based. This means it shares unchanged database pages between revisions. SirixDB allows sophisticated time-travel queries and implements diffing algorithms. It stores XML and JSON in a binary format natively, but could as well store graphs or other kinds of data.

Ideas for the future would be:

  • horizontally scaling, that is writing through a single master, providing reading your own writes consistency, replicate resources on a few cluster-nodes… most probably using ZooKeeper and Apache BookKeeper with exactly once delivery semantics…
  • interactive visualizations of the differences between revisions of the resources. SirixDB currently stores tree structured data in a binary format, that is both XML and JSON. Diffing capabilities are already there. Also some outdated visualizations[2] in Processing which I’d love to port to D3 to the web. Furthermore a web-interface would be nice
  • Adding cost-based query optimizer rules and index-rewrite rules to improve query performance considerably
  • Looking into how to cleverly be able to delete old revisions (I have to look up how ZFS allows deletion of snapshots). However, as a kind of ugly hack a background process could for instance copy the most recent revision to a new resource for now. It’s getting kind of tricky I guess as unchanged database pages are shared between revisions and record pages are even versioned. Thus, a page needs to be reconstructed from page fragments of different revisions depending on the algorithm used.

Besides I want to finish stuff for versioning the whole database, not just resources in a database.

Until recently I thought I’d look into horizontal scaling, to use the GraalVM for native images, that is to provide super fast startup times in docker containers, work on writing/reading from a Bookkeeper cluster and deploy everything to a Kubernetes cluster.

But maybe showcasing what’s possible with beautiful interactive visualizations would get probably more attention and I think for me it would be great to learn front-end stuff, too. It might also be more useful due to the complete lack of users, thus it’s only really interesting from an engineering perspective 😉

Kind regards and have a great weekend
Johannes

[1] https://sirix.io and https://github.com/sirixdb/sirix
[2] https://m.youtube.com/watch?feature=youtu.be&v=l9CXXBkl5vI

原文链接:Version Your Database / Future Directions

© 版权声明
THE END
喜欢就支持一下吧
点赞9 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容