[Open Data Hub Contributors] Re: Using custom spark images in opendatahub operator

Monday, 19 August 2019

The customization is basically get a spark 2.4.3 distribution installed in
the image and add the required jars in $SPARK_HOME/jars directory to access
Ceph/S3 buckets. the fix for the issue you mention is added in the code
since spark 2.3 iirc.

If the current image already has these things done, I can use the image as
is.

On Mon, Aug 19, 2019 at 3:31 PM Sherard Griffin <shgriffi(a)redhat.com&gt; wrote:

...
 Ricardo,

 What are the customizations?  Is this to fix retrieving the tables and
 databases from Hive metastore?  I've fixed the permissions on the
 spark-cluster-image repo, although I'm not a fan of the name of it...
 Should just be "spark".

 Thanks,
 Sherard

 On Mon, Aug 19, 2019 at 12:02 PM Ricardo Martinelli de Oliveira <
 rmartine(a)redhat.com&gt; wrote:

> One side note about the spark image: Because Spark SQL Thrift server has
> an open issue in spark 2.2 that breaks thrift server we need to use spark
> 2.4. I already have the image built and I can push it when I have proper
> permissions to do it.
>
> As for Landon suggestion, I think it's a good idea and I can in advance
> create a quay.io/opendatahub/spark-cluster-image:2.4 tag if needed so we
> can reuse the same name but with a different tag.
>
> On Mon, Aug 19, 2019 at 11:10 AM Landon LaSmith <llasmith(a)redhat.com&gt;
> wrote:
>
>> Do we want to store all of the ODH spark images in the same repository?
>> We already have quay.io/opendatahub/spark-cluster-image
>> <https://quay.io/repository/opendatahub/spark-cluster-image?tab=tags>;.
>> Should we deprecate that repo and create a new one to store both?
>>
>> On Mon, Aug 19, 2019 at 10:06 AM Ricardo Martinelli de Oliveira <
>> rmartine(a)redhat.com&gt; wrote:
>>
>>> Thanks everyone for the feedback!
>>>
>>> As the winner is #2 (push the custom imago into quay.io operndatahub
>>> organization), who should I ask for permissions to push my image in this
>>> org?
>>>
>>> My quay.io username is the same as my kerberos user.
>>>
>>> On Mon, Aug 19, 2019 at 10:57 AM Landon LaSmith <llasmith(a)redhat.com&gt;
>>> wrote:
>>>
>>>> Agree with #2
>>>>
>>>> On Mon, Aug 19, 2019 at 9:53 AM Václav Pavlín <vasek(a)redhat.com&gt;
>>>> wrote:
>>>>
>>>>> I agree with #2 - ODH should work out of box, so we need to provide
>>>>> the image (which is a no for #1), and #3 sounds like an overkill
>>>>>
>>>>> Thanks,
>>>>> V.
>>>>>
>>>>> On Mon, Aug 19, 2019 at 3:43 PM Alex Corvin
<acorvin(a)redhat.com&gt;
>>>>> wrote:
>>>>>
>>>>>> I think my vote is for #2. Option #1 will continue to be
supported
>>>>>> for groups that need it, but we can make it easier for people to
get up and
>>>>>> running by curating an official image.
>>>>>>
>>>>>>
>>>>>> On August 19, 2019 at 9:33:44 AM, Ricardo Martinelli de Oliveira
(
>>>>>> rmartine(a)redhat.com) wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm integrating Spark SQL Thrift server into ODH operator and
I need
>>>>>> to use a custom spark image (other than the RADAnalytics image)
with
>>>>>> additional jars to access Ceph/S3 buckets. Actually, both thrift
server and
>>>>>> the spark cluster will need this custom spark image in order to
access the
>>>>>> buckets.
>>>>>>
>>>>>> With that being said, I'd like to discuss some options to get
this
>>>>>> done. I am thinking about these options:
>>>>>>
>>>>>> 1) Let the customer specify the custom image in the yaml file
(this
>>>>>> is already possible)
>>>>>> 2) Create that custom spark image and publish on quay.io
>>>>>> opendarahub organization
>>>>>> 3) Add a buildconfig object and make operator create the custom
>>>>>> build and set the image location into the deploymentconfig
objects
>>>>>>
>>>>>> Although the third option automate everything and deliver the
whole
>>>>>> set with the custom image, there's this thing about
supporting custom
>>>>>> images within operators. We'd need to add a spark_version
variable where
>>>>>> the build could download the spark distribution corresponding to
that
>>>>>> version and the artifacts related and run the build. In the first
option,
>>>>>> we simply don't create the build objects and document that in
order to use
>>>>>> Thrift server in ODH operator, both spark cluster and thrift must
use a
>>>>>> custom spark image containing the jars needed to access Ceph/S3.
At last,
>>>>>> the middle term between both is option two, so we don't need
to worry about
>>>>>> delegate this task to the user or the operator.
>>>>>>
>>>>>> What do you think? What could be the best option for this
scenario?
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ricardo Martinelli De Oliveira
>>>>>>
>>>>>> Data Engineer, AI CoE
>>>>>>
>>>>>> Red Hat Brazil <https://www.redhat.com/>
>>>>>>
>>>>>> Av. Brigadeiro Faria Lima, 3900
>>>>>>
>>>>>> 8th floor
>>>>>>
>>>>>> rmartine(a)redhat.com    T: +551135426125
>>>>>> M: +5511970696531
>>>>>> @redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
>>>>>> <https://www.facebook.com/redhatjobs> @redhatjobs
>>>>>> <https://instagram.com/redhatjobs>
>>>>>> <https://www.redhat.com/>
>>>>>> _______________________________________________
>>>>>> Contributors mailing list -- contributors(a)lists.opendatahub.io
>>>>>> To unsubscribe send an email to
>>>>>> contributors-leave(a)lists.opendatahub.io
>>>>>>
>>>>>> _______________________________________________
>>>>>> Contributors mailing list -- contributors(a)lists.opendatahub.io
>>>>>> To unsubscribe send an email to
>>>>>> contributors-leave(a)lists.opendatahub.io
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Open Data Hub, AI CoE, Office of CTO, Red Hat
>>>>> Brno, Czech Republic
>>>>> Phone: +420 739 666 824
>>>>>
>>>>> _______________________________________________
>>>>> Contributors mailing list -- contributors(a)lists.opendatahub.io
>>>>> To unsubscribe send an email to
>>>>> contributors-leave(a)lists.opendatahub.io
>>>>>
>>>>
>>>>
>>>> --
>>>> Landon LaSmith
>>>> Sr.Software Engineer
>>>> Red Hat, AI CoE - Data Hub
>>>>
>>>
>>>
>>> --
>>>
>>> Ricardo Martinelli De Oliveira
>>>
>>> Data Engineer, AI CoE
>>>
>>> Red Hat Brazil <https://www.redhat.com/>
>>>
>>> Av. Brigadeiro Faria Lima, 3900
>>>
>>> 8th floor
>>>
>>> rmartine(a)redhat.com    T: +551135426125
>>> M: +5511970696531
>>> @redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
>>> <https://www.facebook.com/redhatjobs> @redhatjobs
>>> <https://instagram.com/redhatjobs>
>>> <https://www.redhat.com/>
>>>
>>
>>
>> --
>> Landon LaSmith
>> Sr.Software Engineer
>> Red Hat, AI CoE - Data Hub
>>
>
>
> --
>
> Ricardo Martinelli De Oliveira
>
> Data Engineer, AI CoE
>
> Red Hat Brazil <https://www.redhat.com/>
>
> Av. Brigadeiro Faria Lima, 3900
>
> 8th floor
>
> rmartine(a)redhat.com    T: +551135426125
> M: +5511970696531
> @redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
> <https://www.facebook.com/redhatjobs> @redhatjobs
> <https://instagram.com/redhatjobs>
> <https://www.redhat.com/>
> _______________________________________________
> Contributors mailing list -- contributors(a)lists.opendatahub.io
> To unsubscribe send an email to contributors-leave(a)lists.opendatahub.io
>

 --
 Thanks,
 Sherard Griffin

-- 

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil <https://www.redhat.com/>

Av. Brigadeiro Faria Lima, 3900

8th floor

rmartine(a)redhat.com    T: +551135426125
M: +5511970696531
@redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
<https://www.facebook.com/redhatjobs> @redhatjobs
<https://instagram.com/redhatjobs>
<https://www.redhat.com/>

2025

2024

2023

2022

2021

2020

2019

2018

[Open Data Hub Contributors] Re: Using custom spark images in opendatahub operator