Skip to main content

Posts

Showing posts from February, 2019

Spark User Defined Functions

Spark User Defined Functions     ·         Write UDFs with a single DataFrame column inputs ·         Transform data using UDFs using both the DataFrame and SQL API ·         Analyze the performance trade-offs between built-in functions and UDFs How to register UDF? Python user-defined function (UDF) example: def squared (s):   return s * s spark . udf . register( "squaredWithPython" , squared) How to invoke UDF? spark . range ( 1 , 20 ) . registerTempTable ( "test" ) from pyspark.sql.functions import udf from pyspark.sql.types import LongType squared_udf = udf ( squared , LongType ()) df = spark . table ( "test" ) display ( df . select ( "id" , squared_udf ( "id" ) . alias ( "id_squared" )))

Job Scheduling in the Cronjob

Job Scheduling in the Cronjob  Job Scheduling in the Cronjob   nnCron make active use of cron format in both  classic  and  extended  modes. There are typically two formats: 1)  Traditional (inherited from Unix) cron format consists of five fields separated by white spaces: <Minute> <Hour> <Day_of_the_Month> <Month_of_the_Year> <Day_of_the_Week> Ex: 0 15 9 31 * ? 2) nnCron can use both traditional and "enhanced" version of cron format, which has an additional (6th) field: <Year>: <Minute> <Hour> <Day_of_the_Month> <Month_of_the_Year> <Day_of_the_Week> <Year> A user can select the format he would like to use by selecting or unselecting the  Year field  checkbox on General tab in  Options  dialog (which can be opened by doublecliking the nnCron icon in system tray). By default, nnCron uses the enhanced format. The following graph sh...