Excelent! Thanks!I went ahead and already created a merge request[1] that agregate Hive Metastore, Spark SQL Thrift Server and Hue into ODH operator. If possible, please review it.Meanwhile, I'll make sure the image hosted in opendatahub org is enough to deploy these components.On Mon, Aug 19, 2019 at 4:00 PM Sherard Griffin <shgriffi@redhat.com> wrote:Ok awesome. You should have the permissions to upload to that repo now.On Mon, Aug 19, 2019 at 2:57 PM Ricardo Martinelli de Oliveira <rmartine@redhat.com> wrote:The customization is basically get a spark 2.4.3 distribution installed in the image and add the required jars in $SPARK_HOME/jars directory to access Ceph/S3 buckets. the fix for the issue you mention is added in the code since spark 2.3 iirc.If the current image already has these things done, I can use the image as is.On Mon, Aug 19, 2019 at 3:31 PM Sherard Griffin <shgriffi@redhat.com> wrote:Ricardo,What are the customizations? Is this to fix retrieving the tables and databases from Hive metastore? I've fixed the permissions on the spark-cluster-image repo, although I'm not a fan of the name of it... Should just be "spark".Thanks,SherardOn Mon, Aug 19, 2019 at 12:02 PM Ricardo Martinelli de Oliveira <rmartine@redhat.com> wrote:One side note about the spark image: Because Spark SQL Thrift server has an open issue in spark 2.2 that breaks thrift server we need to use spark 2.4. I already have the image built and I can push it when I have proper permissions to do it.As for Landon suggestion, I think it's a good idea and I can in advance create a quay.io/opendatahub/spark-cluster-image:2.4 tag if needed so we can reuse the same name but with a different tag.On Mon, Aug 19, 2019 at 11:10 AM Landon LaSmith <llasmith@redhat.com> wrote:Do we want to store all of the ODH spark images in the same repository? We already have quay.io/opendatahub/spark-cluster-image. Should we deprecate that repo and create a new one to store both?On Mon, Aug 19, 2019 at 10:06 AM Ricardo Martinelli de Oliveira <rmartine@redhat.com> wrote:Thanks everyone for the feedback!As the winner is #2 (push the custom imago into quay.io operndatahub organization), who should I ask for permissions to push my image in this org?My quay.io username is the same as my kerberos user.On Mon, Aug 19, 2019 at 10:57 AM Landon LaSmith <llasmith@redhat.com> wrote:Agree with #2On Mon, Aug 19, 2019 at 9:53 AM Václav Pavlín <vasek@redhat.com> wrote:I agree with #2 - ODH should work out of box, so we need to provide the image (which is a no for #1), and #3 sounds like an overkillThanks,V.On Mon, Aug 19, 2019 at 3:43 PM Alex Corvin <acorvin@redhat.com> wrote:_______________________________________________I think my vote is for #2. Option #1 will continue to be supported for groups that need it, but we can make it easier for people to get up and running by curating an official image.
On August 19, 2019 at 9:33:44 AM, Ricardo Martinelli de Oliveira (rmartine@redhat.com) wrote:
Hi,_______________________________________________I'm integrating Spark SQL Thrift server into ODH operator and I need to use a custom spark image (other than the RADAnalytics image) with additional jars to access Ceph/S3 buckets. Actually, both thrift server and the spark cluster will need this custom spark image in order to access the buckets.With that being said, I'd like to discuss some options to get this done. I am thinking about these options:1) Let the customer specify the custom image in the yaml file (this is already possible)2) Create that custom spark image and publish on quay.io opendarahub organization3) Add a buildconfig object and make operator create the custom build and set the image location into the deploymentconfig objectsAlthough the third option automate everything and deliver the whole set with the custom image, there's this thing about supporting custom images within operators. We'd need to add a spark_version variable where the build could download the spark distribution corresponding to that version and the artifacts related and run the build. In the first option, we simply don't create the build objects and document that in order to use Thrift server in ODH operator, both spark cluster and thrift must use a custom spark image containing the jars needed to access Ceph/S3. At last, the middle term between both is option two, so we don't need to worry about delegate this task to the user or the operator.What do you think? What could be the best option for this scenario?
--
Ricardo Martinelli De Oliveira
Data Engineer, AI CoE
Av. Brigadeiro Faria Lima, 3900
8th floor
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io
--_______________________________________________Open Data Hub, AI CoE, Office of CTO, Red Hat
Brno, Czech Republic
Phone: +420 739 666 824
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io
--Landon LaSmithSr.Software EngineerRed Hat, AI CoE - Data Hub--Ricardo Martinelli De Oliveira
Data Engineer, AI CoE
Av. Brigadeiro Faria Lima, 3900
8th floor
--Landon LaSmithSr.Software EngineerRed Hat, AI CoE - Data Hub--_______________________________________________Ricardo Martinelli De Oliveira
Data Engineer, AI CoE
Av. Brigadeiro Faria Lima, 3900
8th floor
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io
--Thanks,Sherard Griffin--Ricardo Martinelli De Oliveira
Data Engineer, AI CoE
Av. Brigadeiro Faria Lima, 3900
8th floor
--Thanks,Sherard Griffin--Ricardo Martinelli De Oliveira
Data Engineer, AI CoE
Av. Brigadeiro Faria Lima, 3900
8th floor