[Open Data Hub Contributors] Re: Using custom spark images in opendatahub operator

Monday, 19 August 2019

Ok awesome.  You should have the permissions to upload to that repo now.

On Mon, Aug 19, 2019 at 2:57 PM Ricardo Martinelli de Oliveira <
rmartine(a)redhat.com&gt; wrote:

...
 The customization is basically get a spark 2.4.3 distribution
installed in
 the image and add the required jars in $SPARK_HOME/jars directory to access
 Ceph/S3 buckets. the fix for the issue you mention is added in the code
 since spark 2.3 iirc.

 If the current image already has these things done, I can use the image as
 is.

 On Mon, Aug 19, 2019 at 3:31 PM Sherard Griffin <shgriffi(a)redhat.com&gt;
 wrote:

> Ricardo,
>
> What are the customizations?  Is this to fix retrieving the tables and
> databases from Hive metastore?  I've fixed the permissions on the
> spark-cluster-image repo, although I'm not a fan of the name of it...
> Should just be "spark".
>
> Thanks,
> Sherard
>
> On Mon, Aug 19, 2019 at 12:02 PM Ricardo Martinelli de Oliveira <
> rmartine(a)redhat.com&gt; wrote:
>
>> One side note about the spark image: Because Spark SQL Thrift server has
>> an open issue in spark 2.2 that breaks thrift server we need to use spark
>> 2.4. I already have the image built and I can push it when I have proper
>> permissions to do it.
>>
>> As for Landon suggestion, I think it's a good idea and I can in advance
>> create a quay.io/opendatahub/spark-cluster-image:2.4 tag if needed so
>> we can reuse the same name but with a different tag.
>>
>> On Mon, Aug 19, 2019 at 11:10 AM Landon LaSmith <llasmith(a)redhat.com&gt;
>> wrote:
>>
>>> Do we want to store all of the ODH spark images in the same repository?
>>> We already have quay.io/opendatahub/spark-cluster-image
>>> <https://quay.io/repository/opendatahub/spark-cluster-image?tab=tags>;.
>>> Should we deprecate that repo and create a new one to store both?
>>>
>>> On Mon, Aug 19, 2019 at 10:06 AM Ricardo Martinelli de Oliveira <
>>> rmartine(a)redhat.com&gt; wrote:
>>>
>>>> Thanks everyone for the feedback!
>>>>
>>>> As the winner is #2 (push the custom imago into quay.io operndatahub
>>>> organization), who should I ask for permissions to push my image in this
>>>> org?
>>>>
>>>> My quay.io username is the same as my kerberos user.
>>>>
>>>> On Mon, Aug 19, 2019 at 10:57 AM Landon LaSmith
<llasmith(a)redhat.com&gt;
>>>> wrote:
>>>>
>>>>> Agree with #2
>>>>>
>>>>> On Mon, Aug 19, 2019 at 9:53 AM Václav Pavlín
<vasek(a)redhat.com&gt;
>>>>> wrote:
>>>>>
>>>>>> I agree with #2 - ODH should work out of box, so we need to
provide
>>>>>> the image (which is a no for #1), and #3 sounds like an overkill
>>>>>>
>>>>>> Thanks,
>>>>>> V.
>>>>>>
>>>>>> On Mon, Aug 19, 2019 at 3:43 PM Alex Corvin
<acorvin(a)redhat.com&gt;
>>>>>> wrote:
>>>>>>
>>>>>>> I think my vote is for #2. Option #1 will continue to be
supported
>>>>>>> for groups that need it, but we can make it easier for people
to get up and
>>>>>>> running by curating an official image.
>>>>>>>
>>>>>>>
>>>>>>> On August 19, 2019 at 9:33:44 AM, Ricardo Martinelli de
Oliveira (
>>>>>>> rmartine(a)redhat.com) wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm integrating Spark SQL Thrift server into ODH operator
and I
>>>>>>> need to use a custom spark image (other than the RADAnalytics
image) with
>>>>>>> additional jars to access Ceph/S3 buckets. Actually, both
thrift server and
>>>>>>> the spark cluster will need this custom spark image in order
to access the
>>>>>>> buckets.
>>>>>>>
>>>>>>> With that being said, I'd like to discuss some options to
get this
>>>>>>> done. I am thinking about these options:
>>>>>>>
>>>>>>> 1) Let the customer specify the custom image in the yaml file
(this
>>>>>>> is already possible)
>>>>>>> 2) Create that custom spark image and publish on quay.io
>>>>>>> opendarahub organization
>>>>>>> 3) Add a buildconfig object and make operator create the
custom
>>>>>>> build and set the image location into the deploymentconfig
objects
>>>>>>>
>>>>>>> Although the third option automate everything and deliver the
whole
>>>>>>> set with the custom image, there's this thing about
supporting custom
>>>>>>> images within operators. We'd need to add a spark_version
variable where
>>>>>>> the build could download the spark distribution corresponding
to that
>>>>>>> version and the artifacts related and run the build. In the
first option,
>>>>>>> we simply don't create the build objects and document
that in order to use
>>>>>>> Thrift server in ODH operator, both spark cluster and thrift
must use a
>>>>>>> custom spark image containing the jars needed to access
Ceph/S3. At last,
>>>>>>> the middle term between both is option two, so we don't
need to worry about
>>>>>>> delegate this task to the user or the operator.
>>>>>>>
>>>>>>> What do you think? What could be the best option for this
scenario?
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ricardo Martinelli De Oliveira
>>>>>>>
>>>>>>> Data Engineer, AI CoE
>>>>>>>
>>>>>>> Red Hat Brazil <https://www.redhat.com/>
>>>>>>>
>>>>>>> Av. Brigadeiro Faria Lima, 3900
>>>>>>>
>>>>>>> 8th floor
>>>>>>>
>>>>>>> rmartine(a)redhat.com    T: +551135426125
>>>>>>> M: +5511970696531
>>>>>>> @redhatjobs <https://twitter.com/redhatjobs>  
redhatjobs
>>>>>>> <https://www.facebook.com/redhatjobs> @redhatjobs
>>>>>>> <https://instagram.com/redhatjobs>
>>>>>>> <https://www.redhat.com/>
>>>>>>> _______________________________________________
>>>>>>> Contributors mailing list --
contributors(a)lists.opendatahub.io
>>>>>>> To unsubscribe send an email to
>>>>>>> contributors-leave(a)lists.opendatahub.io
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Contributors mailing list --
contributors(a)lists.opendatahub.io
>>>>>>> To unsubscribe send an email to
>>>>>>> contributors-leave(a)lists.opendatahub.io
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Open Data Hub, AI CoE, Office of CTO, Red Hat
>>>>>> Brno, Czech Republic
>>>>>> Phone: +420 739 666 824
>>>>>>
>>>>>> _______________________________________________
>>>>>> Contributors mailing list -- contributors(a)lists.opendatahub.io
>>>>>> To unsubscribe send an email to
>>>>>> contributors-leave(a)lists.opendatahub.io
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Landon LaSmith
>>>>> Sr.Software Engineer
>>>>> Red Hat, AI CoE - Data Hub
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Ricardo Martinelli De Oliveira
>>>>
>>>> Data Engineer, AI CoE
>>>>
>>>> Red Hat Brazil <https://www.redhat.com/>
>>>>
>>>> Av. Brigadeiro Faria Lima, 3900
>>>>
>>>> 8th floor
>>>>
>>>> rmartine(a)redhat.com    T: +551135426125
>>>> M: +5511970696531
>>>> @redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
>>>> <https://www.facebook.com/redhatjobs> @redhatjobs
>>>> <https://instagram.com/redhatjobs>
>>>> <https://www.redhat.com/>
>>>>
>>>
>>>
>>> --
>>> Landon LaSmith
>>> Sr.Software Engineer
>>> Red Hat, AI CoE - Data Hub
>>>
>>
>>
>> --
>>
>> Ricardo Martinelli De Oliveira
>>
>> Data Engineer, AI CoE
>>
>> Red Hat Brazil <https://www.redhat.com/>
>>
>> Av. Brigadeiro Faria Lima, 3900
>>
>> 8th floor
>>
>> rmartine(a)redhat.com    T: +551135426125
>> M: +5511970696531
>> @redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
>> <https://www.facebook.com/redhatjobs> @redhatjobs
>> <https://instagram.com/redhatjobs>
>> <https://www.redhat.com/>
>> _______________________________________________
>> Contributors mailing list -- contributors(a)lists.opendatahub.io
>> To unsubscribe send an email to contributors-leave(a)lists.opendatahub.io
>>
>
>
> --
> Thanks,
> Sherard Griffin
>

 --

 Ricardo Martinelli De Oliveira

 Data Engineer, AI CoE

 Red Hat Brazil <https://www.redhat.com/>

 Av. Brigadeiro Faria Lima, 3900

 8th floor

 rmartine(a)redhat.com    T: +551135426125
 M: +5511970696531
 @redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
 <https://www.facebook.com/redhatjobs> @redhatjobs
 <https://instagram.com/redhatjobs>
 <https://www.redhat.com/>

-- 
Thanks,
Sherard Griffin

2026

2025

2024

2023

2022

2021

2020

2019

2018

[Open Data Hub Contributors] Re: Using custom spark images in opendatahub operator