top of page

Simplified Explanation of Feature Engineering



Feature Engineering


What is Feature engineering


Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work better. The importance of feature engineering can help improve the accuracy of a machine learning model by creating new features that better represent the underlying patterns in the data.


These are a few examples of features engineering :

  • Creating new features by combining existing features

  • Transforming existing features to better represent the data

  • Creating features to capture patterns in the data

  • Creating features to better capture the relationships between features


Scenario :


For instance, think of you as a credit card loan officer and your job is to find who is more likely to close their account and stop using credit cards. So basically what you are doing is taking customer information and other transaction information. Front this raw data, you are trying to create features to understand the credit score of the customer credit account. The credit score information can be the number of credit cards he or she owns and balance in their credit account and their salary. The frequency of credit card use can be a good feature to explain the debt payment ratio to the time he or she makes the payment and settles the credit card in the last 15,30 or 60 days. So right here you are creating multiple features to understand the credit score of your customer.


In general, these features are feeding into the model to train and make it able to understand whether the customers are likely to close their accounts or not. So this is what transforms your raw data into meaningful insights.


It is worth, noting that with the right transformation can enhance the model performance and it is so much easier for the model to understand the meaning of the data.


Now let’s take a look at the feature engineering technique.


 

Feature Engineering Technique:


1. Imputation

Imputation is a method used to fill in missing data. There are many ways to do imputation, but the most common method is to use the mean or median of the data to fill in the missing values.


2. Handling outliers

An outlier is an observation that is far from the rest of the data. Outliers can be caused by errors in the data, or they can be legitimate observations that are simply different from the rest of the data. Outliers can skew your results, so it is important to identify and deal with them appropriately. A few methods in handling outliers are removal, replacing values, capping and discretization.

  • Removal: Common method is to simply discard any data points that fall outside of a certain range

  • Replacing Values: Outliers can be handled by replacing values for any missing data with the right imputed values. Some common implementations of missing values are replaced with mean or median numbers.

  • Capping: To replace the maximum and lowest values, use an arbitrary value or a value from a variable distribution.

  • Discretization: Discretization is the process of converting a continuous variable into a discrete variable. In machine learning, this is often done to convert data that is too difficult or expensive to process into a form that is more manageable. This can be done by binning the data, which means grouping together data points that are close together, or by discretizing the data into a fixed number of intervals.


3. One Hot Encoding

One-hot encoding is a process by which categorical variables are converted into a form that can be used by machine learning algorithms. The one-hot encoding process transforms categorical variables into a vector of zeros and ones, where each vector has only one element set to one and the rest set to zero. The element that is set to one corresponds to the category that the observation belongs to.


4. Scaling with Normalization and Standardization

Scaling with normalization involves re-scaling data so that it fits within a specific range, like 0 to 1. Standardization, on the other hand, involves re-scaling data so that it has a mean of 0 and a standard deviation of 1.

 

Conclusion

Good feature engineering can often lead to improved model performance, while feature selection can help to avoid overfitting and improve computational efficiency. In many cases, both feature engineering and feature selection are necessary in order to build an effective model.


 

About Ever AI


Have a lot of data but don't know how to leverage the most out of it?

Need AI solutions for your business?

Have a Machine Learning model but don't know how to deploy? Sign up here, Ever AI Web Apps https://ever-ai.app/

Join our Telegram Channel for more information - https://t.me/aitechforeveryone



We provide a NO CODE End-to-end data science platform for you.


Would you like to understand the theory of AI better?

Contact us to have our trainers organise a workshop for you and your team.


20 views0 comments

Recent Posts

See All
Post: Blog2_Post
bottom of page