Spark User Defined
Functions
·
Write UDFs with a
single DataFrame column inputs
·
Transform data using
UDFs using both the DataFrame and SQL API
·
Analyze the
performance trade-offs between built-in functions and UDFs
How to register UDF?
Python
user-defined function (UDF) example:
def squared(s):
return s * s
spark.udf.register("squaredWithPython",
squared)
How to invoke UDF?
spark.range(1, 20).registerTempTable("test")
from pyspark.sql.functions import udf
from pyspark.sql.types import LongType
squared_udf = udf(squared, LongType())
df = spark.table("test")
display(df.select("id", squared_udf("id").alias("id_squared")))
Comments
Post a Comment