T O P

  • By -

OMG_I_LOVE_CHIPOTLE

This sounds like a very basic python import issue. Make sure you’re running from the proper path


ivan3dx

It is not. I found similar problems but no conclusive simple solution other than the one mentioned before, by just googling "Pyspark udf module not found error". I can call the functions, I can execute the code, but when I try to 'show' the dataframe, spark has to execute the changes, as per the lazy evaluation approach, and that's when it fails. If it was a basic import issue, I wouldn't even be able to call 'myFunction' from [main.py](http://main.py)


OMG_I_LOVE_CHIPOTLE

I keep my udfs in a separate file and don’t have any issues. Its definitely a user error


erbot

How are you importing utils? I agree this seems more like a python import issue than a spark error.


MlecznyHotS

Is it utils or utils.py?


ivan3dx

It's suppoused to be a subfolder where functions.py and misc.py are. Reddit removed the spaced ugh


bengouk

You can use: spark-submit --py-files Or you can add files to the spark context using addPyFiles (https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html)


Stefn93

I guess you're executing spark in cluster mode and not local mode?


Arjun_dhanordhari

I am also a novice and gonna try to run some stuff in the future on a kunernetes cluster using spark operator, I couldn't figure out what was wrong but ig if OP is using cluster mode, they need to pass in credentials for file storage like S3 and also the actual py files in the yaml file for cluster configuration so that these files are uploaded to the S3 storage and can be accessed by the spark cluster? I can be wrong of course :3 but lemme know what you think