It’s been REALLY a long time since I last updated…

This is a previous school NLP project that I have done for my deep learning class.

I basically did the data preprocessing and augmentation, and used my own pc for all the training/testing.

Dish Names are All You Need : Predicting Ingredients with Dish Names

Introduction:

In most cases, dish names are straightforward, like “garlic chicken” or “roast pork”. However, when browsing the menu at a fancy restaurant, the names of the dishes can be confusing or even misleading, and make people wonder what’s really inside the food – What is “sloppy joes”? And what is “bubble and squeak”? Does “egg cream” have egg and cream in it?

Our project is an attempt to solve such problem by providing ingredients prediction based on input dish names.

Our model would be very different from traditional rule-based models. We want it not only applicable to predict existing dishes not fed into the model during training, but it would be also applicable for novel dishes that the users creatively come up with.

Methodology

The dataset we are using is from Kaggle (​​https://www.kaggle.com/datasets/shuyangli94/foodcom-recipes-with-search-terms-and-tags). It consists of 494963 data points and 10 columns.

For preprocessing, we made a pipeline including tokenization, padding and truncating, and frequency distribution.

Seq2Seq, encoder-decoder model.
We built two models, the first is based on RNN with GRU and the second is based on Transformer with multi-headed attention.

In training, we have ground truth as label and can easily calculate for loss and apply gradients to do backpropagation. In testing, we need prediction sampling to feed the “labels” one by one to get more predictions.

The following is an RNN model:

The following is a Transformer Model:

Metrics

Loss: Sparse Categorical CrossEntropy Loss.
Accuracy: Jaccard Similarity compares how good our prediction is. In training, there is both prediction and ground truth.
We convert both prediction and ground truth into sets of words. We then calculate the Jaccard Similarity between these two sets by:
J = card(G & P) / card(G | P),
where G is the ground truth words set and P is the prediction words set.

Challenges

Some challenges we encountered along our project included preprocessing, exploring different evaluation metrics, and output generating methodology from the probability distribution predicted by the trained model. 

For preprocessing, we used the spacy package to do tokenization to decrease the vocab size. After tokenizations, pairs like “onion” and “onions” get reduced to the same token and share the same word embedding.

Our current approach for making predictions is to apply greedy decoding on the outputted vocab size prediction sampling, that is, we always select the ingredient with the highest probability. This sometimes results in generating duplicate ingredients. One approach we attempted is to first select, say, 20 ingredients from the probability distribution with the highest probabilities. We then select the most probable option among the top 20 that has not already been predicted to avoid duplicate outputs. However, this method appeared to generate suboptimal results, performing worse than the original greedy decoding approach.

Results

Quantitative:

The graph on the right shows performances(in terms of perplexity, which is exponentiated loss) of models with different sets of (hyper)parameters, with respect to training epochs.

We tuned on embedding_size(hidden_size), batch_size, attention heads.

So far the best set of parameters for Transformer is:

embedding size: 256

batch size: 100

attention heads: 3

After 10 epochs training, the perplexity dropped to around 14.

Qualitative:

Input: [‘chocolate’, ‘cookies’]

Prediction: [‘butter’, ‘sugar’, ‘egg’, ‘vanilla extract’, ‘all-purpose flour’, ‘bake powder’, ‘salt’, ‘walnut’]

Input: [‘pepper’, ‘steak’]

Prediction: [‘flank steak’, ‘salt and pepper’, ‘spanish onion’, ‘button mushroom’, ‘pimento stuff olive’, ‘butter’, ‘dry white wine’, ‘beef broth’, ‘cornstarch’, ‘cold water’]

Input: [‘orange’, ‘chicken’]

Prediction:boneless skinless chicken breast half, orange marmalade, soy sauce, lemon juice, fresh ginger, garlic

Input: [‘apple’, ‘chicken’]

Prediction: boneless skinless chicken breast half, salt, dry mustard, chicken, granny smith apple, onion, granny smith apple, brown sugar, cornstarch, cinnamon, nutmeg, cinnamon, nutmeg, cinnamon, nutmeg

Future Work

  • Explore and implement evaluation metrics better fitting to our use case
  • Research on ways of dealing with out-of-vocabulary inputs
  • Search for  more datasets to cover a wider range of cuisines to reduce model bias
  • Train on larger datasets if we have access to more powerful computational resources

Related Works

Food Ingredients and Recipes Dataset with Images:

https://www.kaggle.com/datasets/pes12017000148/food-ingredients-and-recipe-dataset-with-images

Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images:

http://pic2recipe.csail.mit.edu/

https://pub.towardsai.net/

Recibrew! Predicting Food Ingredients with Deep Learning:

recibrew-find-out-the-foods-ingredients-dbc2a4e37383

Leave a Reply

Your email address will not be published. Required fields are marked *