Skip to content

↑ Date Features

Date features are like 2023-06-25 15:00:00, 2023-06-25, 6-25-2023, or 6/25/2023.

Preprocessing

Ludwig will try to infer the date format automatically, but a specific format can be provided. The date string spec is the same as the one described in python's datetime.

preprocessing:
    missing_value_strategy: fill_with_const
    fill_value: ''
    datetime_format: null
name: date_feature_name
type: date
preprocessing:
  missing_value_strategy: fill_with_const
  fill_value: ''
  datetime_format: "%d %b %Y"

Parameters:

  • missing_value_strategy (default: fill_with_const) : What strategy to follow when there's a missing value in a date column Options: fill_with_const, bfill, ffill, drop_row. See Missing Value Strategy for details.
  • fill_value (default: ``): The value to replace missing values with in case the missing_value_strategy is fill_with_const
  • datetime_format (default: null): This parameter can either be a datetime format string, or null, in which case the datetime format will be inferred automatically.

Preprocessing parameters can also be defined once and applied to all date input features using the Type-Global Preprocessing section.

Input Features

Input date features are transformed into a int tensors of size N x 9 (where N is the size of the dataset and the 9 dimensions contain year, month, day, weekday, yearday, hour, minute, second, and second of day).

For example, the date 2022-06-25 09:30:59 would be deconstructed into:

[
  2022,   # Year
  6,      # June
  25,     # 25th day of the month
  5,      # Weekday: Saturday
  176,    # 176th day of the year
  9,      # Hour
  30,     # Minute
  59,     # Seconds
  34259,  # 34259th second of the day
]

The encoder parameters specified at the feature level are:

  • tied (default null): name of another input feature to tie the weights of the encoder with. It needs to be the name of a feature of the same type and with the same encoder parameters.

Currently there are two encoders supported for dates: DateEmbed (default) and DateWave. The encoder can be set by specifying embed or wave in the feature's encoder parameter in the input feature's configuration.

Example date feature entry in the input features list:

name: date_feature_name
type: date
encoder: 
    type: embed

Encoder type and encoder parameters can also be defined once and applied to all date input features using the Type-Global Encoder section.

Encoders

Embed Encoder

This encoder passes the year through a fully connected layer of one neuron and embeds all other elements for the date, concatenates them and passes the concatenated representation through fully connected layers.

encoder:
    type: embed
    dropout: 0.0
    embedding_size: 10
    output_size: 10
    activation: relu
    norm: null
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    embeddings_on_cpu: false
    norm_params: null
    num_fc_layers: 0
    fc_layers: null

Parameters:

  • dropout (default: 0.0) : Dropout probability for the embedding.
  • embedding_size (default: 10) : The maximum embedding size adopted.
  • output_size (default: 10) : If an output_size is not already specified in fc_layers this is the default output_size that will be used for each layer. It indicates the size of the output of a fully connected layer.
  • activation (default: relu): The default activation function that will be used for each layer. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
  • norm (default: null): The default norm that will be used for each layer. Options: batch, layer, null. See Normalization for details.
  • use_bias (default: true): Whether the layer uses a bias vector. Options: true, false.
  • bias_initializer (default: zeros): Initializer to use for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
  • weights_initializer (default: xavier_uniform): Initializer to use for the weights matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
  • embeddings_on_cpu (default: false): Whether to force the placement of the embedding matrix in regular memory and have the CPU resolve them. Options: true, false.
  • norm_params (default: null): Parameters used if norm is either batch or layer. See Normalization for details.
  • num_fc_layers (default: 0): The number of stacked fully connected layers.
  • fc_layers (default: null): List of dictionaries containing the parameters for each fully connected layer.

Wave Encoder

This encoder passes the year through a fully connected layer of one neuron and represents all other elements for the date by taking the cosine of their value with a different period (12 for months, 31 for days, etc.), concatenates them and passes the concatenated representation through fully connected layers.

encoder:
    type: wave
    dropout: 0.0
    output_size: 10
    activation: relu
    norm: null
    use_bias: true
    bias_initializer: zeros
    weights_initializer: xavier_uniform
    norm_params: null
    num_fc_layers: 1
    fc_layers: null

Parameters:

  • dropout (default: 0.0) : Dropout probability for the embedding.
  • output_size (default: 10) : If an output_size is not already specified in fc_layers this is the default output_size that will be used for each layer. It indicates the size of the output of a fully connected layer.
  • activation (default: relu): The default activation function that will be used for each layer. Options: elu, leakyRelu, logSigmoid, relu, sigmoid, tanh, softmax, null.
  • norm (default: null): The default norm that will be used for each layer. Options: batch, layer, null. See Normalization for details.
  • use_bias (default: true): Whether the layer uses a bias vector. Options: true, false.
  • bias_initializer (default: zeros): Initializer to use for the bias vector. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
  • weights_initializer (default: xavier_uniform): Initializer to use for the weights matrix. Options: uniform, normal, constant, ones, zeros, eye, dirac, xavier_uniform, xavier_normal, kaiming_uniform, kaiming_normal, orthogonal, sparse, identity.
  • norm_params (default: null): Parameters used if norm is either batch or layer. See Normalization for details.
  • num_fc_layers (default: 1): The number of stacked fully connected layers.
  • fc_layers (default: null): List of dictionaries containing the parameters for each fully connected layer.

Output Features

There is currently no support for date as an output feature. Consider using the TEXT type.