Skip to content

Feature Column or Keras Preprocessing Layer

brightcoder01 edited this page Feb 18, 2020 · 8 revisions

Feature Column or Keras Preprocessing Layer

Problem

There are two options for feature engineering in TensorFlow: feature column api and keras preprocessing layers (numeric inputs and categorical inputs).

In the data analysis and transform design, we proposed some transform functions to extend the COLUMN syntax. We will generate the python code for feature engineering from the COLUMN clause. We will discuss which api the generated code is built upon - feature column or keras preprocess layer?

Long Term Trend From Open Source Community

In the motivation part from the RFC named Keras Category Inputs, we can see that the community plans to develop Keras Preprocess Layer to replace the feature column api. These layers will be released in TF2.2.

Three pain points for feature column are mentinod in this doc. The following points are copied from the RFC:

* Users have to define both feature columns and Keras Inputs for the model, resulting in code duplication and deviation from DRY (Do not repeat yourself) principle. See this [Github issue](https://github.com/tensorflow/tensorflow/issues/27416).
* Users with large dimension categorical inputs will incur large memory footprint and computation cost, if wrapped with indicator column through `tf.keras.layers.DenseFeatures`.
* Currently there is no way to correctly feed Keras linear model or dense layer with multivalent categorical inputs or weighted categorical inputs.

TODO: Add examples to explain these three points

How to develop common models with feature column and keras preprocess layers

  1. DNN
  2. Wide And Deep
  3. DeepFM

What should we do to cover the transform functions in SQLFlow

Feature Column

  1. Add a new concat_column for the CONCAT transform function.

Keras Preprocess Layer