An Introduction to Neural Style Transfer for Data Scientists

Text based neural style transfer can alter the style of your text

Photo by h heyerlein on Unsplash

If 2Pac was only allowed to release music under the pretence that his style was to match the Queen’s English, the world would have been a significantly worse place.

The advent of Style transfer (the ability to project a style of one text to another) means that it’s now possible for a Neural Network to change the feel of a text.

As you can probably guess, the application of this technology would make it useful in a number of different settings. A simple first application would be to make an article sound more formal:

Informal : I’d say it is punk though.

Formal : However, I do believe it to be punk.

but even further from this, this technology could be used to help people with problems like dyslexia.

More recently, the news that Microsoft was laying off journalists wasn’t groundbreaking news: advertising revenues are down across the board and newspapers are generally struggling to be as profitable as they were before (which was already a bit of a struggle). However, the news that they were to replace this team with AI is what startled people.

I’ve always loved writing but I’ve always sucked. My english teacher refused to let me answer questions, because undoubtedly, my answer would be wrong.

Fast forward 15 years and I’m building machine learning tools to solve just about any problem I can think of. More importantly, Neural Networks have recently found a new domain to better. Microsoft Word now incorporates a new AI that can offer to rewrite a suggestion in full, rather than simple spelling and grammatical fixes.

Have you ever been unable to express something in a given way?

Being unable to phrase something in a certain tone or to give of a certain impression is something that many writers struggle with. To preserve time, focus and energy, this tool will help writers to be able to more effectively captivate their audience by tilting the wording better. That’s what Microsoft aimed to fix here, and in what follows i’ll explain how. Microsoft have said:

“In internal evaluations, it was nearly 15 percent more effective than previous approaches in catching mistakes commonly made by people who have dyslexia.”

Neural Style Transfer

The updates that Microsoft have recently incorporated are broadly similar to the product that grammarly are well known for [can reference this]. Both sets of researchers are taking advantages of recent developments in the field of Style Transfer.

Neural Style Transfer was initially used between images, whereby, a certain composition of an image could be projected onto something similar.

Neural Style Transfer between Images [Source]

However, this technique has recently been adapted for the use case of text style transfer. To do this, researchers took advantage of neural machine translations models to serve the purpose of style transferring. Think about it: a certain ‘tone’ or ‘style’ could be seen as another language and therefore:

“We create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work”

The baseline model in the theory of neural machine translation is based on Yoshua Bengio’s paper here, building upon Sutskevers work on Sequence to Sequence learning. A neural network is formed as a RNN Encoder Decoder which works as follows.

[Source]

Here, a phrase is passed into the encoder which coverts the string into a vector. This vector effectively contains a latent representation of the phrase, which is then translated using a decoder. This is called an ‘encoder-decoder architecture’ and in this manner, Neural Machine Translation (NMT) can translate local translation problems.

An example of Neural Machine Translation from Sutskever et al (2014). We can see that all the japanese text is encoded into h values, which is then decoded into English.

For neural machine translation, it uses a bidirectional RNN to process the source sentence into vectors (encoding) along with a second RNN to predict words in the target language (decoding). This process, while differing from phrase-based models in method, prove to be comparable in speed and accuracy.

Creating a model

To create a neural style transfer model, we generally have 3 key steps that we have to take:

1) Embedding

Words are categorical in nature so the model must first be able to embed the words, finding an alternative representation that can be used in the network. A vocabulary (size V) is selected with only frequent words treated as unique, all other words are converted to an “unknown” token and get the same embedding. The embedding weights, one set per language, are usually learned during training.

# Embedding
embedding_encoder = variable_scope.get_variable("embedding_encoder", [src_vocab_size, embedding_size], ...)
encoder_emb_inp = embedding_ops.embedding_lookup(embedding_encoder, encoder_inputs)

2) Encoding

Once the word embedding are retrieved, they are fed as the input into the main model which consists of two multi-layer RNNs, where one of these is an encoder for the source language and the other is a decoder for the target language. In practice, these two RNN’s are trained to have different parameters (such models do a better job when fitting large training datasets).

# Build RNN cell
encoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
# Run Dynamic RNN
encoder_outputs, encoder_state = tf.nn.dynamic_rnn(encoder_cell, encoder_emb_inp, sequence_length=source_sequence_length, time_major=True)

The reader who’s paying attention to the code will see that sentences can have different lengths and to avoid wasting computation here, we tell dynamic_rnn the exact source sentence lengths through source_sequence_length and since our input is time major, we set time_major=True.

3) Decoding

The decoder needs to have access to source information. A simple way to achieve this is to initialise it with the last hidden state of the encoder, encoder_state.

# Build RNN cell
decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
# Helper
helper = tf.contrib.seq2seq.TrainingHelper(decoder_emb_inp, decoder_lengths, time_major=True)
# Decoder
decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cell, helper, encoder_state, output_layer=projection_layer)
# Dynamic decoding
outputs, _ = tf.contrib.seq2seq.dynamic_decode(decoder, ...)
logits = outputs.rnn_output

Lastly, we haven’t mentioned projection_layer which is a dense matrix to turn the top hidden states to logit vectors of dimension V. We illustrate this process at the top of Figure 2.

projection_layer = layers_core.Dense(tgt_vocab_size, use_bias=False)

and finally, given the logits above, we are now ready to compute our training loss:

crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=decoder_outputs, logits=logits)
train_loss = (tf.reduce_sum(crossent * target_weights)/batch_size)

We have now defined the forward pass of our NMT model. Computing the back propagation pass is just a matter of a few lines of code:

# Calculate and clip gradients
params = tf.trainable_variables()
gradients = tf.gradients(train_loss, params)
clipped_gradients, _ = tf.clip_by_global_norm(gradients, max_gradient_norm)

Now from here, you’re ready to begin the optimisation procedures behind creating your own neural style transfer model!

Note: the code above was largely taken from the tensorflow github documentation and more information about this procedure can be found online.


The theory of Neural Machine Style Transfer is quite an extensive history and it’s taken a while for academia to reach the current perch it’s sat upon. Translations are a notoriously difficult task because of grammatical problems, but also, interpretability for text to sound somewhat human: somewhat colloquial.

The progress that’s been made is fantastic and it’s something that will be great if it keeps on developing.


Thanks for reading! If you have any messages, please let me know!

Keep up to date with my latest articles here!


Code Support:

  1. https://www.tensorflow.org/tutorials/generative/style_transfer
  2. https://github.com/fuzhenxin/Style-Transfer-in-Text
  3. https://www.groundai.com/project/what-is-wrong-with-style-transfer-for-texts/1

Using Docker for Deep Learning

Struggling to recreate your results? So was I

Photo by Maria Teneva on Unsplash

Deep Learning projects are always pretty big. They involve huge datasets, huge amounts of processing power but annoyingly, they also require a huge amount of investigation as more often then not, they just don’t work in the beginning. It’s after a bunch of feature selectionmodel fitting and parameter tuning do you get something that looks half decent.

In saying that though, running this all in the form of a container can be a bit daunting because not only do you need to deploy a server somewhere, you need to have tests in place to make sure that what’s happening inside your cordoned off container is expected. You need to make sure that your machine learning algorithm is converging in the right way, and you need to be sure that any errors are picked up.

Despite these (and many more) difficulties, using containers for Deep Learning comes with great rewards. The below are but a few of the main difficulties that are overcome by using containers.


Reason 1: Isolation

Just like how multiple containers can exist on a ship — computerised containers can exist in harmony on a server. The benefit of this is that resources are predefined, so machine learning algorithms can run in parallel on the same machine without competing for resources.

The only thing more demanding than one machine learning project is two, so if these are set up correctly, isolated projects can co-exist pretty well in a highly distributed environment.

Reason 2: Avoiding Dependency Conflicts

Machine Learning projects are often pretty large projects that use a wide variety of underlying tech. However, the problem comes when your classification algorithm uses the latest version of Numpy and your regression algorithm uses a more recent version — which should exist on your system?

In reality, these libraries can exist in parallel but require isolated systems. Just how your deep learning system may rely on conflicting libraries, containers allow for isolation so through time, you’re less likely to run into these really annoying problems.

Reason 3: Portability

In the process of creating a container, you generally predefine everything that goes into it (code and all). Given that, you can save the ‘image’ of the container and share it easily. It’s so useful that companies exist in this space (Note DockerHub).

Reason 4: Microservices

The awesome part of containers is that you’re able to expose certain parts of them. This means that if you want to create something like an API, then you can expose parts of the containers that allow you to run an API in a more controlled way. Specifically, this is great if your API involves a large computational process, or, something with a lot of moving parts. By separating workflows, you can identify where systems are failing much quicker.

Check the following reference to learn a bit more:3 reasons to always use containers for microservices-based applications
Microservices are the emerging application platform: It is the architecture that will serve as the basis for many…techbeacon.com

Reason 5: Reproducibility

I think the ultimate reason why containers are useful for Deep Learning is because they aid in reproduction. It’s pretty common for researchers to make ground-breaking discoveries when in research mode. However, once the algorithm is re-trained, they may find that a bug had caused the amazing results or they can’t recreate the results!

This is obviously an issue as you don’t know what gave you the great results so it makes sense to have some form of framework that allows you to robustly reproduce results. Code tracking engines obviously work (like GitHub) but its worth going that one step further to really push forward.


Code for Dockerising Deep Learning

If you want some simple hands on code to implement a Dockerised Deep Learning project, then this link here is great for just that.

If you find it a bit confusing, then this following project (using an example from the world of fashion) may be a little bit easier to digest.


The above were 5 quick reasons why containers are great for Deep Learning, and two references were given to help you learn how to implement it.

Docker is pretty powerful and despite being a couple of years old now (and a number of great iterations/improvements being developed), it’s still awesome to be able to get the functionality that one desires: that being isolation and reproducibility.

Thanks for reading! If you have any messages, please let me know!

Keep up to date with my latest articles here!

R vs Python: 8 Reasons Why Python Wins

Data Scientists using R should have switched a while ago

Photo by Kvistholt Photography on Unsplash

Every now and then, I get into a discussion with a Data Scientist who I just disagree with.

Be it something methodological or be it something thematic — different people have different approaches and sometimes, just sometimes, I’m definitely right.

R is a fantastic piece of software, there’s no denying that. However it’s limited by a lack of features and things that don’t evolve generally get replaced through time. If they do improve, the libraries utilised are often specialised and fragmented which can also result with dependency issues.

In what follows, I’ll go over 8 reasons why I chose to learn Python in the beginning, how it’s helped me in my career and smaller technical differences that can lead to larger problems down the line.


1: Python Is Everywhere

If you’re thinking about a programming language to learn then something which is being used widely should be a starting point. There’s no escaping this and it’s good thing. Python is being used by everyone: Data Scientists and not.

If you look at Google or StackOverflow trendsPython is super hot. More jobs also require Python coders than R coders.

If you want to be in with the crowd, Python is a good place to start.

2: Massive Online Community

If you’re ever stuck on a problem or you can’t figure something out, the online community of coders has to be one of the nicest places out there. Knowledge sharing in the coding community is altruistic and frequent, with many blogging and coding websites specifically targeting coders who are learning or are struggling to complete a task.

This community makes it so much easier to tackle harder problems and therefore, a bigger community allows coders to solve their problems quicker. I’ve relied heavily on the community and you will too.

3: Easy Deployment

When you make a piece of software or a nifty little tool, what do you do with it then? Do you want to keep it locally to look at or do you want to deploy it to fix the problem?

As a coder you should always be looking for problems and moreover, you should always be thinking about solutions. Is my code too slow? Is my code too clunky? Are my data tasks too bulky?

Regardless of what it is, you should be able to come up with a solution and integrate it easily. As Python can essentially do the whole vertical when it comes to coding, if you’ve made some statistical code, it’s easier to integrate it with Python than it is with R.

4: Python is the Whole Tech Stack

You have to wear multiple hats when you’re a data scientist. You have to be able to manage large data sets, you have to be able to clean them. Then you have to be able to analyse and deduce something important. After which you have to be able to integrate your findings into the business to improve an object (sales, efficiency etc).

Being able to do all of this in R just isn’t possible. However w

5: Not as hard to learn

With a larger online community meaning more resources to learn from, Python has one other advantage over RActually on UDacity, they say that R is easier to pick up if you’ve learned something like C++ or Java before.

Whilst I’m not fully convinced to the full extent of this logic, I can say for sure that R is less obvious in its coding syntax. Having more commands and more libraries is part of the problem, but having operators like <- and if functions like:

if (test_expression) {statement}

It definitely isn’t as easy as Python.

6: Better Documentation

Python has a few key libraries that really do the bulk of the work:

  • Numpy
  • Pandas
  • SKLearn
  • Scipy

Now these four libraries are open-source but they’ve been developed with a solid group of volunteers. They’ve been developed in a professional way so that all the documentation is complete, most functions come with tests (to ensure functionality) and also, documentation links to the academic references for which they follow.

On the other hand, R packages are generally made by academics/specialists who have little time and therefore, may not be able to make as high quality documentation as we’d like.

You can read the following discussion on Reddit for more information:

Or you can look at common R libraries and see what you think.

7: Python is Faster than R

Quick answer => Python!Is Python faster than R?
R vs Python Speed Benchmark on a simple Machine Learning Pipelinetowardsdatascience.com

But in reality, you shouldn’t use either of these languages for speed.

8: More Jobs

Empirical data is often the most informative and looking at the website indeed.com, as of right now (04/02/2020), there are 4,100 Python Developer jobs and only 291 jobs under the category of R Developer.

That means there’s almost 30x more Python jobs going than R jobs on this one website.

In light of that, a statistician would tell you that you’re also more likely to get a higher paid job, though you could argue R is more specialised so requires a higher salary. According to Business Insider, R has a slightly higher average salary.

From my personal experience though, every company I’ve worked with has generally opted for a Python based infrastructure and if they haven’t got it, they’re working towards it.


R is a great language but it’s just limited. From a Data Science perspective, it does match up to what Python has to offer but it’s all very fragmented. Python libraries like NumpyPandas and Sklearn carry so much responsibility, so much so that they allow developers to be lazy and live within a small bubble of possibility.

Maybe that’s a bad thing, maybe it’s a good thing. I’ve always said that if you want to be a good Data Scientist, you have to be know your scope from first principles — so if you can’t write out the formulae to your models, you shouldn’t be using them in the first place.


Thanks for reading! If you have any messages, please let me know!

Keep up to date with my latest articles here!

The 4 Top AI/ML Python GitHub Repos in February 2021

Inspirational Open Sourced Projects

Photo by Florian Olivo on Unsplash

The online community for coders is one of the warmest communities out there. The advocation for open-source projects, along with the support which goes with it is jaw dropping, and for that, people who work on these public repo’s are real gems.

Coding standards and requirements have changed so much over the past 20 years. The earlier languages (like C++, HTML) were notoriously difficult and painful to learn but recent languages have made development so much easier and with that, lowering the barrier to entry to great projects.

We see below that some insane tech is now just…open…and free…with no catch. From advanced visual recognition to advanced NLP: it’s nothing short of insane the kind of tech you can get access to for free.

In what follows, I share 4 really cool projects that I absolutely love, and also, are some of the top trending projects on GitHub. Hope you enjoy, and let me know if you have any questions!


Project 1: Sense

Action recognition is an incredibly difficult task to automate because every human is different, so for a camera to be able to generalise and pick up your actions requires a lot of data.

Sense is an engine that uses neural networks to recognise actions. Moreover, they’ve made the model efficient so it’s pretty light on its feet and can easily be deployed. The engines here are trained on millions of videos of humans performing the a variety of actions.

If you want, you can even play with the additional features to calculate count given the action. I mean, it’s pretty cool.


Project 2: Tensortrade

If you know anything about reinforcement learning you know that using it for trading seems logical. However, it’s a bit more complicated then that because state dynamics in Finance change through time, so it’s hard to use the past to predict the future.

However, Tensortrade is still in Beta, but looks to help users to grips with the theory of building, training and deploying automated trading algorithms. It’s meant to have been made pretty extensible, which means that if you have want to include your own features, or, want to incorporate this software into your engine, it should be possible.

The main thing that Tensortrade try to do is to make it easy and fast to test algo strategies but I feel that sometimes, a small barrier may protect a lot of people from using something potentially flawed. Anyways, try it out and make your own opinion!

Project 3: MLFlow

Deploying a big machine learning project is no fun but MLFlow are here to help. The ‘platform’ assists users in packaging machine learning projects into reproducible codes and helps users to share the models as well. A lot of the hard stuff comes included too, like logging, Conda/Docker, also a centralised model store.

For those of us who build and deploy numerous machine learning models (I’m looking at you in the NLP space), then this is good.

Project 4: spaCy

My NLP Brothers!

And what have we got here?

spaCy is awesome. Like it’s really, really good. It essentially provides a lot of the groundwork for state of the art NLP modelling in Python. It’s been built with the latest research so if you’re trying to replicate results from a paper, this is a very good place to start.

Advanced problems can be sucky to start off with but spaCy makes light work of tokenization, parsing, languages, entity recognition and all the other problems we have to think intently about. Because the API removes any complexity from it, you’re able to spend more time focusing on the difficult problems like prediction or inference. Moreover, it’s pretty quick.

spaCy is awesome. No ifs, no buts.


There you have it. I’ve offered 4 of my recent favourite github projects and I hope you enjoy them as much as I do!

Thanks for reading! If you have any questions, please let me know!

Keep up to date with my latest articles here!

Advertisements

Design a site like this with WordPress.com
Get started