The customization is basically get a spark 2.4.3 distribution installed in the image and add the required jars in $SPARK_HOME/jars directory to access Ceph/S3 buckets. the fix for the issue you mention is added in the code since spark 2.3 iirc.

If the current image already has these things done, I can use the image as is.


On Mon, Aug 19, 2019 at 3:31 PM Sherard Griffin <shgriffi@redhat.com> wrote:
Ricardo,

What are the customizations?  Is this to fix retrieving the tables and databases from Hive metastore?  I've fixed the permissions on the spark-cluster-image repo, although I'm not a fan of the name of it...  Should just be "spark".

Thanks,
Sherard

On Mon, Aug 19, 2019 at 12:02 PM Ricardo Martinelli de Oliveira <rmartine@redhat.com> wrote:
One side note about the spark image: Because Spark SQL Thrift server has an open issue in spark 2.2 that breaks thrift server we need to use spark 2.4. I already have the image built and I can push it when I have proper permissions to do it.

As for Landon suggestion, I think it's a good idea and I can in advance create a quay.io/opendatahub/spark-cluster-image:2.4 tag if needed so we can reuse the same name but with a different tag.

On Mon, Aug 19, 2019 at 11:10 AM Landon LaSmith <llasmith@redhat.com> wrote:
Do we want to store all of the ODH spark images in the same repository? We already have quay.io/opendatahub/spark-cluster-image. Should we deprecate that repo and create a new one to store both?

On Mon, Aug 19, 2019 at 10:06 AM Ricardo Martinelli de Oliveira <rmartine@redhat.com> wrote:
Thanks everyone for the feedback!

As the winner is #2 (push the custom imago into quay.io operndatahub organization), who should I ask for permissions to push my image in this org?

My quay.io username is the same as my kerberos user.

On Mon, Aug 19, 2019 at 10:57 AM Landon LaSmith <llasmith@redhat.com> wrote:
Agree with #2

On Mon, Aug 19, 2019 at 9:53 AM Václav Pavlín <vasek@redhat.com> wrote:
I agree with #2 - ODH should work out of box, so we need to provide the image (which is a no for #1), and #3 sounds like an overkill

Thanks,
V.

On Mon, Aug 19, 2019 at 3:43 PM Alex Corvin <acorvin@redhat.com> wrote:
I think my vote is for #2. Option #1 will continue to be supported for groups that need it, but we can make it easier for people to get up and running by curating an official image.


On August 19, 2019 at 9:33:44 AM, Ricardo Martinelli de Oliveira (rmartine@redhat.com) wrote:

Hi,

I'm integrating Spark SQL Thrift server into ODH operator and I need to use a custom spark image (other than the RADAnalytics image) with additional jars to access Ceph/S3 buckets. Actually, both thrift server and the spark cluster will need this custom spark image in order to access the buckets.

With that being said, I'd like to discuss some options to get this done. I am thinking about these options:

1) Let the customer specify the custom image in the yaml file (this is already possible)
2) Create that custom spark image and publish on quay.io opendarahub organization
3) Add a buildconfig object and make operator create the custom build and set the image location into the deploymentconfig objects

Although the third option automate everything and deliver the whole set with the custom image, there's this thing about supporting custom images within operators. We'd need to add a spark_version variable where the build could download the spark distribution corresponding to that version and the artifacts related and run the build. In the first option, we simply don't create the build objects and document that in order to use Thrift server in ODH operator, both spark cluster and thrift must use a custom spark image containing the jars needed to access Ceph/S3. At last, the middle term between both is option two, so we don't need to worry about delegate this task to the user or the operator.

What do you think? What could be the best option for this scenario?

--

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil

Av. Brigadeiro Faria Lima, 3900

8th floor

rmartine@redhat.com    T: +551135426125    
M: +5511970696531    

_______________________________________________
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io
_______________________________________________
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io


--
Open Data Hub, AI CoE, Office of CTO, Red Hat
Brno, Czech Republic
Phone: +420 739 666 824

_______________________________________________
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io


--
Landon LaSmith
Sr.Software Engineer
Red Hat, AI CoE - Data Hub


--

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil

Av. Brigadeiro Faria Lima, 3900

8th floor

rmartine@redhat.com    T: +551135426125    
M: +5511970696531    



--
Landon LaSmith
Sr.Software Engineer
Red Hat, AI CoE - Data Hub


--

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil

Av. Brigadeiro Faria Lima, 3900

8th floor

rmartine@redhat.com    T: +551135426125    
M: +5511970696531    

_______________________________________________
Contributors mailing list -- contributors@lists.opendatahub.io
To unsubscribe send an email to contributors-leave@lists.opendatahub.io


--
Thanks,
Sherard Griffin


--

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil

Av. Brigadeiro Faria Lima, 3900

8th floor

rmartine@redhat.com    T: +551135426125    
M: +5511970696531