Data Engineering

Spark User Defined Functions

Spark User Defined Functions · Write UDFs with a single DataFrame column inputs · Transform data using UDFs using both the DataFrame and SQL API · Analyze the performance trade-offs between built-in functions and UDFs How to register UDF? Python user-defined function (UDF) example: def squared (s): return s * s spark . udf . register( "squaredWithPython" , squared) How to invoke UDF? spark . range ( 1 , 20 ) . registerTempTable ( "test" ) from pyspark.sql.functions import udf from pyspark.sql.types import LongType squared_udf = udf ( squared , LongType ()) df = spark . table ( "test" ) display ( df . select ( "id" , squared_udf ( "id" ) . alias ( "id_squared" )))

Job Scheduling in the Cronjob

Job Scheduling in the Cronjob Job Scheduling in the Cronjob nnCron make active use of cron format in both classic and extended modes. There are typically two formats: 1) Traditional (inherited from Unix) cron format consists of five fields separated by white spaces: <Minute> <Hour> <Day_of_the_Month> <Month_of_the_Year> <Day_of_the_Week> Ex: 0 15 9 31 * ? 2) nnCron can use both traditional and "enhanced" version of cron format, which has an additional (6th) field: <Year>: <Minute> <Hour> <Day_of_the_Month> <Month_of_the_Year> <Day_of_the_Week> <Year> A user can select the format he would like to use by selecting or unselecting the Year field checkbox on General tab in Options dialog (which can be opened by doublecliking the nnCron icon in system tray). By default, nnCron uses the enhanced format. The following graph sh...

Data Engineering

Search This Blog

Posts

Spark User Defined Functions

Job Scheduling in the Cronjob