Thanks for following it up and the pointer to Data Catalog. I will take a look at it.
Since Apache Hive megastore is included in data datalog already, what I need is just a PrestoSQL cluster as alternative to Apache Spark. PrestoSQL is one excellent SQL query engine for:
* interactive queries from workloads like dashboard via superset
* ad/hoc query from superset/notebook
* ETL tasks to move data among different data stores with transaction support.
I’m happy to present it back once I have enough experience with Data Catalog.
We do have an ODH component that covers some of Presto's functionality, Data Catalog: https://opendatahub.io/news/2019-12-15/data-catalog-in-odh.html
We are working on migrating Data Catalog to ODH 0.7+. If you are interested in contributing Presto to ODH, you are welcome to present it to the community and tell us how it compares to Data Catalog.