{"id":176,"date":"2019-03-12T14:01:00","date_gmt":"2019-03-12T13:01:00","guid":{"rendered":"https:\/\/daniel.liljeberg.io\/?p=176"},"modified":"2021-04-01T22:02:08","modified_gmt":"2021-04-01T20:02:08","slug":"language-classification-using-machine-learning-in-php","status":"publish","type":"post","link":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/","title":{"rendered":"Language classification using Machine Learning in PHP"},"content":{"rendered":"\n<p class=\"has-drop-cap wp-block-paragraph\">So, like many of us I decided to dabble a bit in Machine Learning (ML) and took a short course in the subject. One of the parts of the course was to create a small ML project. I wanted to try and make something practical out of it though and looked around at work to see what potential issues I could try to solve using ML.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Like any large company we do business basically over the entire globe. Business generates support requests and my first idea was to estimate the expected time a case would take to solve based on different parameters of the case. This could then be used to allocate staffing, identify specific types of cases that would benefit from implementing better solutions etc. Having limited time though and not access to enough data about the cases I was forced to abandon this since the only thing I was able to easily get at the time was basically the subject line, email of the sender and a few other pieces. Not enough to generate a near enough robust module for me to trust the estimates it would produce.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So I looked at it again, what other &#8220;problems&#8221; do our support tickets present? Well, since many of the support requests are sent to the initial email address that address receives support requests in all manner of languages. Sure, we can manually handle that and re-assign to support personnel in the country in question or ask everyone that submits a request to re-submit in English. But one takes time and adds the risk of requests being left for a time before being re-assigned and the other had already been tested without much success. Partly because people traditionally had already been used to receiving support in their native language and just because our company grew it made little sense that this would go away.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, what if we could identify the language and then use that to redirect the request to a group of people who know the language in question and can provide support in that language?<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"400\" height=\"360\" src=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/PHP-ML.png\" alt=\"PHP-ML\" class=\"wp-image-169\" srcset=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/PHP-ML.png 400w, https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/PHP-ML-300x270.png 300w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Many ML examples are written in, Python and so was a major part of the course. But I decided to see if I could write my project in PHP. I do most of my development using C++ but when it comes to web-development and scripting I have used PHP for a long time. So I wanted to see how easy it would be to use PHP and it would also enable me to easily use what I created in my own PHP projects. A quick look around and I found&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/php-ml.readthedocs.io\/en\/latest\/\" target=\"_blank\">PHP-ML<\/a>. A Machine Learning library for PHP. It lacks some of the stuff you find in Scikit-learn for instance, but it seems to be in active development and adding features so things like&nbsp;<em>Convolutional Neural Networks<\/em>&nbsp;that are not in the library today will probably make it in with time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Choosing classifier<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Since I would be working with text and the goal was to&nbsp;<em>classify&nbsp;<\/em>which language a given text belonged to a classifier like Naive Bayes or SVC felt like the best options.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In order to facilitate helping new going down the ML rabbit hole Scikit-learn.org has a good map for choosing the correct classifier and that affirmed my initial pick.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"638\" src=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-Classification-1024x638.png\" alt=\"scikit-learn cheat-sheet\" class=\"wp-image-170\" srcset=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-Classification-1024x638.png 1024w, https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-Classification-300x187.png 300w, https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-Classification-768x479.png 768w, https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-Classification-1536x958.png 1536w, https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-Classification-2048x1277.png 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">I decided to test both Naive Bayes and SVC since I felt Naive Bayes would be faster to train, but SVC might result in better accuracy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data collection and pre-processing<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The first obstacle in creating any ML model is to get the data to train on. If I would like to be able to classify a set of languages I clearly needed sample sentences from these languages to train on. But collecting a large set of sentences in each language would be quite time consuming and time was something I didn&#8217;t have, not to mention I didn&#8217;t speak the majority of these languages. I decided to go another, more rudimentary route. Instead of manually translating a list of sentences or finding unique lists of sentences I decided to let Google do the job for me.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I started by collecting a set of sentences in English, using that as my base sample set. I acquired these were basically by Googling&nbsp;<em>\u201cEnglish sentences for kids\u201d, \u201cEnglish sentences for adults\u201d<\/em>&nbsp;etc and compiling a list of sentences that I could use as my base. The first issue I found was that many of the sentences included very uniquely English sayings, which could probably cause issues since they are not to be taken literally. But I cleaned up the most extreme ones and ended up with 1635 English sentences.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If this was going into the actual production environment, I would vet the base data-set much more than this since that is the base for everything to come. But for my initial proof-of-concept it would have to do.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After this, I looked up which languages Google supported.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$languages = [\n    'af' =&gt; 'Afrikaans',\n    'sq' =&gt; 'Albanian',\n    'ar' =&gt; 'Arabic',\n    'az' =&gt; 'Azerbaijani',\n    'eu' =&gt; 'Basque',\n    'bn' =&gt; 'Bengali',\n    'be' =&gt; 'Belarusian',\n    'bg' =&gt; 'Bulgarian',\n    'ca' =&gt; 'Catalan',\n    'zh-CN' =&gt; 'Chinese Simplified',\n    'zh-TW' =&gt; 'Chinese Traditional',\n    'hr' =&gt; 'Croatian',\n    'cs' =&gt; 'Czech',\n    'da' =&gt; 'Danish',\n    'nl' =&gt; 'Dutch',\n    'en' =&gt; 'English',\n    'eo' =&gt; 'Esperanto',\n    'et' =&gt; 'Estonian',\n    'tl' =&gt; 'Filipino',\n    'fi' =&gt; 'Finnish',\n    'fr' =&gt; 'French',\n    'gl' =&gt; 'Galician',\n    'ka' =&gt; 'Georgian',\n    'de' =&gt; 'German',\n    'el' =&gt; 'Greek',\n    'gu' =&gt; 'Gujarati',\n    'ht' =&gt; 'Haitian Creole',\n    'iw' =&gt; 'Hebrew',\n    'hi' =&gt; 'Hindi',\n    'hu' =&gt; 'Hungarian',\n    'is' =&gt; 'Icelandic',\n    'id' =&gt; 'Indonesian',\n    'ga' =&gt; 'Irish',\n    'it' =&gt; 'Italian',\n    'ja' =&gt; 'Japanese',\n    'kn' =&gt; 'Kannada',\n    'ko' =&gt; 'Korean',\n    'la' =&gt; 'Latin',\n    'lv' =&gt; 'Latvian',\n    'lt' =&gt; 'Lithuanian',\n    'mk' =&gt; 'Macedonian',\n    'ms' =&gt; 'Malay',\n    'mt' =&gt; 'Maltese',\n    'no' =&gt; 'Norwegian',\n    'fa' =&gt; 'Persian',\n    'pl' =&gt; 'Polish',\n    'pt' =&gt; 'Portuguese',\n    'ro' =&gt; 'Romanian',\n    'ru' =&gt; 'Russian',\n    'sr' =&gt; 'Serbian',\n    'sk' =&gt; 'Slovak',\n    'sl' =&gt; 'Slovenian',\n    'es' =&gt; 'Spanish',\n    'sw' =&gt; 'Swahili',\n    'sv' =&gt; 'Swedish',\n    'ta' =&gt; 'Tamil',\n    'te' =&gt; 'Telugu',\n    'th' =&gt; 'Thai',\n    'tr' =&gt; 'Turkish',\n    'uk' =&gt; 'Ukrainian',\n    'ur' =&gt; 'Urdu',\n    'vi' =&gt; 'Vietnamese',\n    'cy' =&gt; 'Welsh',\n    'yi' =&gt; 'Yiddish'\n];\n<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">My idea was to make http requests to&nbsp;<em>translate.google.com<\/em>&nbsp;translating the English sentences to each of the other languages. A quick search showed a small library doing exactly what I had in mind already existed so instead of writing it from scratch I decided to use it.&nbsp;<a href=\"https:\/\/github.com\/Stichoza\/google-translate-php\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/Stichoza\/google-translate-php<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With it you can simply call something like<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">$gt = new GoogleTranslate('en', 'sv');\n$translatedString = $gt-&gt;translate(\"Hello World\");\n<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Making it all come together<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">If this was going into a system or library I would have made the LanguageClassifier a class that encapsulated all the functionality and was easy to use through it&#8217;s public interface. But I decided to go with a simple script intended to be used stand alone from a terminal, printing information about what it is doing during execution, for my demonstration.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I had one file&nbsp;<em>sentences.txt<\/em>&nbsp;that held the English sentences to use as a base.&nbsp;<em>languagedatasset.ser<\/em>&nbsp;which contained a serialized array which looked something like this<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"php\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">[\n\t'en' => ['xxxxxx' => 'Hello World', 'yyyyyy' => 'I like coffee in the morning'],\n\t'sv' => ['xxxxxx' => 'Goddag v\u00e4rlden' 'yyyyyy' => 'Jag tycker om kaffe p\u00e5 morgonen'],\n\t...\n]<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">where&nbsp;<em>xxxxxx<\/em>,&nbsp;<em>yyyyyy&nbsp;<\/em>etc where checksums of the original English sentence in order to be able to map a sentence to each of he languages. The model trained on the data set would then be stored in a file called&nbsp;<em>model.dat<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The workflow of the script was something like this.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\/*\n * - If sentences.txt holding base sentences in English exists\n *      - If languagedataset.ser does not exist\n *          - Setup inital english sentences from sentences.txt\n * - If sentences.txt contains new sentences or have removed sentences\n *      - Update english sentences in dataset\n * - For each language\n *      - Check if english sentence exist that is missing for language\n *          - Translate each missing sentence using Google Translate\n *      - Store updated languagedataset.ser\n *\n * - If model.dat already exists or we have to retrain due to changed dataset\n *      - Transform format from languagedataset.ser to an ArrayDataset, train \n *        and check accurcy\n *      - Save model.dat\n * - Else load mode.dat\n * - Classify language of sentences passed\n *\/<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The result is a script that can take a list of strings and spit out predictions regarding which language each string is written in.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">C:\\tools\\php72\\php.exe E:\\Dropbox\\Projects\\php-ml\\languageClassification.php \"vad heter du?\" \"El toro\" \"Wie viel kostet dieser Computer?\" \"Jutro b\u0119dzie pi\u0119kna pogoda\" \"Domani sar\u00e0 bel tempo\" \"Morgen zal het prachtig weer zijn\" \"Huomenna tulee kaunis s\u00e4\u00e4\" \"Mon ordinateur ne d\u00e9marre pas\" \"min computer starter ikke, og det g\u00f8r mig sk\u00f8r\" \"Goedemorgen, Graag ontvang ik de licentiefiles voor de meters zoals in de bijlage genoemd. User name: foo@bar.com Company: Foo Bar klimaattechniek Customer number: 0123456\"\nDataset up to date\nLoading model... Done\narray(10) {\n  [\"vad heter du?\"]=>\n  string(2) \"sv\"\n  [\"El toro\"]=>\n  string(2) \"es\"\n  [\"Wie viel kostet dieser Computer?\"]=>\n  string(2) \"de\"\n  [\"Jutro b\u0119dzie pi\u0119kna pogoda\"]=>\n  string(2) \"pl\"\n  [\"Domani sar\u00e0 bel tempo\"]=>\n  string(2) \"it\"\n  [\"Morgen zal het prachtig weer zijn\"]=>\n  string(2) \"nl\"\n  [\"Huomenna tulee kaunis s\u00e4\u00e4\"]=>\n  string(2) \"fi\"\n  [\"Mon ordinateur ne d\u00e9marre pas\"]=>\n  string(2) \"fr\"\n  [\"min computer starter ikke, og det g\u00f8r mig sk\u00f8r\"]=>\n  string(2) \"da\"\n  [\"Goedemorgen, Graag ontvang ik de licentiefiles voor de meters zoals in de bijlage genoemd. User name: foo@bar.com Company: Foo Bar klimaattechniek Customer number: 0123456\"]=>\n  string(2) \"nl\"\n}\n\ufeff<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Test results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Due to time constraints, instead of using all the languages that my program supported I decided to go with a subset. This allowed me to run several tests of different sizes of the training set to see how that affected performance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I decided to go with eleven different languages and wanted to have a few that were somewhat \u201csimilar\u201d to make it a bit harder for the classifier, so I include Swedish, Norwegian and Danish.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The complete list of tested languages were:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Spanish and Swedish.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The two learning algorithms were run using sample sizes ranging from 100 to 600 sentences. These sentences were all translated into the different languages. So, the sample set of 100 includes 100 sentences for each language. So, the actual sample sizes ranged from 1100 to 6600 sentences.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Of these, a random selection of 90% were used as the training set and the remaining 10% as the test set.&nbsp;<strong>Naive Bayes<\/strong>, as expected, was much faster to train. Time elapsed ofc depends on the hardware used during training, but it\u2019s fair to say that&nbsp;<strong>Naive Bayes<\/strong>&nbsp;using 600 sentences per language was almost as fast as&nbsp;<strong>SVC<\/strong>&nbsp;at 100 sentences per language. Where&nbsp;<strong>Naive Bayes<\/strong>&nbsp;always was a matter of counting seconds while training,&nbsp;<strong>SVC<\/strong>&nbsp;quickly moved into counting minutes.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"602\" height=\"743\" src=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-TestResult.png\" alt=\"\" class=\"wp-image-171\" srcset=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-TestResult.png 602w, https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/ML-TestResult-243x300.png 243w\" sizes=\"(max-width: 602px) 100vw, 602px\" \/><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">In the table we can see the accuracy results for different amounts of sentences per language used to train on and different variables used for the&nbsp;<strong>SVC<\/strong>&nbsp;classifier. Variants of&nbsp;<strong>Naive Bayes<\/strong>&nbsp;aren\u2019t currently supported by PHP-ML so only the basic version was tested.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since the sentences selected are random each time and the quality of each sentence and its corresponding translations will be better for some than others, variances in reported accuracy is to be expected even for runs with identical settings.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From the table we can see that&nbsp;<strong>Naive Bayes<\/strong>&nbsp;gives a result similar to&nbsp;<strong>SVC<\/strong>&nbsp;at much greater training rate, but&nbsp;<strong>SVC<\/strong>&nbsp;edges ahead if one tweaks the parameters and takes the win with a maximum score of ~94% accuracy.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The more sentences used, the longer it takes to train and the larger the resulting model ends up being. For production purpose I would aim for a reasonable middle ground. A large model takes longer to load, can result in more time being needed for predictions etc.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Source code<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A script implementing the above ideas can be found here:&nbsp;<a href=\"https:\/\/github.com\/inquam\/php-ml-language-classification\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/inquam\/php-ml-language-classification<\/a><\/p>\n\t\t\t<input type=\"hidden\" id=\"wplinkpress-user-email\" value=\"\" \/>\n<input type=\"hidden\" id=\"wplinkpress-authorize-url\" value=\"https:\/\/www.linkedin.com\/oauth\/v2\/authorization?response_type=code&client_id=77uzug7iq2dnd2&redirect_uri=https%3A%2F%2Fdaniel.liljeberg.io%2Fauthorize-linkedin%2F&state=https%3A%2F%2Fdaniel.liljeberg.io%2Fsv%2Fwp-json%2Fwp%2Fv2%2Fposts%2F176&scope=r_liteprofile%20r_emailaddress%20w_member_social\" \/>\n<input type=\"hidden\" id=\"wplinkpress-post-id\" value=\"176\" \/>\n<div class=\"ui wplinkpress comments\">\n<h3 class=\"ui dividing header\">Comments<\/h3>\n<form id=\"add-wplinkpress-comment\" method=\"POST\"> \n<div class=\"comment add-comment\">\n\t<a class=\"avatar\"><img src=\"https:\/\/daniel.liljeberg.io\/wp-content\/plugins\/wplinkpress\/assets\/media\/non-user-icon.jpg\" \/><\/a>\n\t<div class=\"content\">\n\t<textarea id=\"wplinkpress-comment-text-0\" class=\"wplinkpress-comment-text\" style=\"width:100%;\" placeholder=\"Add a comment...\"><\/textarea>\n\t<div class=\"bottom-layer\">\n\t\t<div class=\"comment-atts\" style=\"float:left;\">\n\t\t\t\t<div class=\"feed-share\">\n\t\t<label class=\"switch tips\">\n\t\t\t<input type=\"checkbox\" id=\"toggle-linkedin-feed\" >\n        \t<span class=\"slider round\"><\/span>\n\t\t<\/label>\n\t\t<span>Share on activity feed<\/span>\n\t\t<\/div>\n\t\t<\/div>\n\t<div class=\"wplinkpress_buttons\">\n\t\t<button id=\"authorize_comment_0\" class=\"authorize_comment\" disabled=\"disabled\">Post with LinkedIn<\/button>\n\t\t<\/div>\n\t<\/div>\n\t<\/div>\n<\/div>\n<\/form>\n<h3 class=\"ui dividing header\"><span class=\"wplinkpress-brand\">Powered by WP LinkPress<\/span><\/h3>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>So, like many of us I decided to dabble a bit in Machine Learning (ML) and took a short course in the subject. One of the parts of the course was to create a small ML project. I wanted to try and make something practical out of it though and looked around at work to&hellip;&nbsp;<a href=\"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/\" rel=\"bookmark\">Read More &raquo;<span class=\"screen-reader-text\">Language classification using Machine Learning in PHP<\/span><\/a><\/p>","protected":false},"author":1,"featured_media":178,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"neve_meta_sidebar":"","neve_meta_container":"","neve_meta_enable_content_width":"","neve_meta_content_width":0,"neve_meta_title_alignment":"","neve_meta_author_avatar":"","neve_post_elements_order":"","neve_meta_disable_header":"","neve_meta_disable_footer":"","neve_meta_disable_title":"","neve_meta_reading_time":"","footnotes":""},"categories":[20,6],"tags":[],"class_list":["post-176","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","category-php"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Language classification using Machine Learning in PHP - Daniel Liljeberg<\/title>\n<meta name=\"description\" content=\"Many machine learning examples are written in Python. I decided to see if I could write my project in PHP.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/\" \/>\n<meta property=\"og:locale\" content=\"sv_SE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Language classification using Machine Learning in PHP - Daniel Liljeberg\" \/>\n<meta property=\"og:description\" content=\"Many machine learning examples are written in Python. I decided to see if I could write my project in PHP.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/\" \/>\n<meta property=\"og:site_name\" content=\"Daniel Liljeberg\" \/>\n<meta property=\"article:published_time\" content=\"2019-03-12T13:01:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-04-01T20:02:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2019\/03\/coding-bg.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1192\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Daniel Liljeberg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Skriven av\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daniel Liljeberg\" \/>\n\t<meta name=\"twitter:label2\" content=\"Ber\u00e4knad l\u00e4stid\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minuter\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/\"},\"author\":{\"name\":\"Daniel Liljeberg\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/#\\\/schema\\\/person\\\/e2c3fe10971c37cff2669f5688834cd7\"},\"headline\":\"Language classification using Machine Learning in PHP\",\"datePublished\":\"2019-03-12T13:01:00+00:00\",\"dateModified\":\"2021-04-01T20:02:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/\"},\"wordCount\":1503,\"publisher\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/#\\\/schema\\\/person\\\/e2c3fe10971c37cff2669f5688834cd7\"},\"image\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2019\\\/03\\\/coding-bg.png\",\"articleSection\":[\"Machine Learning\",\"PHP\"],\"inLanguage\":\"sv-SE\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/\",\"url\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/\",\"name\":\"Language classification using Machine Learning in PHP - Daniel Liljeberg\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2019\\\/03\\\/coding-bg.png\",\"datePublished\":\"2019-03-12T13:01:00+00:00\",\"dateModified\":\"2021-04-01T20:02:08+00:00\",\"description\":\"Many machine learning examples are written in Python. I decided to see if I could write my project in PHP.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#breadcrumb\"},\"inLanguage\":\"sv-SE\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#primaryimage\",\"url\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2019\\\/03\\\/coding-bg.png\",\"contentUrl\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2019\\\/03\\\/coding-bg.png\",\"width\":1192,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/sv\\\/2019\\\/03\\\/12\\\/language-classification-using-machine-learning-in-php\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/daniel.liljeberg.io\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Language classification using Machine Learning in PHP\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/#website\",\"url\":\"https:\\\/\\\/daniel.liljeberg.io\\\/\",\"name\":\"Daniel Liljeberg\",\"description\":\"The is no place like 127.0.0.1\",\"publisher\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/#\\\/schema\\\/person\\\/e2c3fe10971c37cff2669f5688834cd7\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/daniel.liljeberg.io\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sv-SE\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/#\\\/schema\\\/person\\\/e2c3fe10971c37cff2669f5688834cd7\",\"name\":\"Daniel Liljeberg\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"sv-SE\",\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2020\\\/12\\\/DanielLiljeberg.png\",\"url\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2020\\\/12\\\/DanielLiljeberg.png\",\"contentUrl\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2020\\\/12\\\/DanielLiljeberg.png\",\"width\":424,\"height\":440,\"caption\":\"Daniel Liljeberg\"},\"logo\":{\"@id\":\"https:\\\/\\\/daniel.liljeberg.io\\\/wp-content\\\/uploads\\\/2020\\\/12\\\/DanielLiljeberg.png\"},\"description\":\"Agile practitioner and advocate. Strong believer in the future of agile organizations, businesses and teams. Got my first computer, a C64, at age 7 and computers has been part of my life since then. Working professionally with development since the early 2000\u2019s in a vast array of technologies and roles. Social, easy going, fun loving guy with an appetite for new challenges and new knowledge who has been \u201cthere\u201d and done \u201cthat\u201d. That\u2019s a good way to sum it all up. Married and father of three kids. All true blessings ;)\",\"sameAs\":[\"https:\\\/\\\/daniel.liljeberg.io\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/danielliljeberg\\\/\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Language classification using Machine Learning in PHP - Daniel Liljeberg","description":"Many machine learning examples are written in Python. I decided to see if I could write my project in PHP.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/","og_locale":"sv_SE","og_type":"article","og_title":"Language classification using Machine Learning in PHP - Daniel Liljeberg","og_description":"Many machine learning examples are written in Python. I decided to see if I could write my project in PHP.","og_url":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/","og_site_name":"Daniel Liljeberg","article_published_time":"2019-03-12T13:01:00+00:00","article_modified_time":"2021-04-01T20:02:08+00:00","og_image":[{"width":1192,"height":720,"url":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2019\/03\/coding-bg.png","type":"image\/png"}],"author":"Daniel Liljeberg","twitter_card":"summary_large_image","twitter_misc":{"Skriven av":"Daniel Liljeberg","Ber\u00e4knad l\u00e4stid":"10 minuter"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#article","isPartOf":{"@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/"},"author":{"name":"Daniel Liljeberg","@id":"https:\/\/daniel.liljeberg.io\/#\/schema\/person\/e2c3fe10971c37cff2669f5688834cd7"},"headline":"Language classification using Machine Learning in PHP","datePublished":"2019-03-12T13:01:00+00:00","dateModified":"2021-04-01T20:02:08+00:00","mainEntityOfPage":{"@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/"},"wordCount":1503,"publisher":{"@id":"https:\/\/daniel.liljeberg.io\/#\/schema\/person\/e2c3fe10971c37cff2669f5688834cd7"},"image":{"@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#primaryimage"},"thumbnailUrl":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2019\/03\/coding-bg.png","articleSection":["Machine Learning","PHP"],"inLanguage":"sv-SE"},{"@type":"WebPage","@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/","url":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/","name":"Language classification using Machine Learning in PHP - Daniel Liljeberg","isPartOf":{"@id":"https:\/\/daniel.liljeberg.io\/#website"},"primaryImageOfPage":{"@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#primaryimage"},"image":{"@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#primaryimage"},"thumbnailUrl":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2019\/03\/coding-bg.png","datePublished":"2019-03-12T13:01:00+00:00","dateModified":"2021-04-01T20:02:08+00:00","description":"Many machine learning examples are written in Python. I decided to see if I could write my project in PHP.","breadcrumb":{"@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#breadcrumb"},"inLanguage":"sv-SE","potentialAction":[{"@type":"ReadAction","target":["https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/"]}]},{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#primaryimage","url":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2019\/03\/coding-bg.png","contentUrl":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2019\/03\/coding-bg.png","width":1192,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/daniel.liljeberg.io\/sv\/2019\/03\/12\/language-classification-using-machine-learning-in-php\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/daniel.liljeberg.io\/"},{"@type":"ListItem","position":2,"name":"Language classification using Machine Learning in PHP"}]},{"@type":"WebSite","@id":"https:\/\/daniel.liljeberg.io\/#website","url":"https:\/\/daniel.liljeberg.io\/","name":"Daniel Liljeberg","description":"The is no place like 127.0.0.1","publisher":{"@id":"https:\/\/daniel.liljeberg.io\/#\/schema\/person\/e2c3fe10971c37cff2669f5688834cd7"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/daniel.liljeberg.io\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sv-SE"},{"@type":["Person","Organization"],"@id":"https:\/\/daniel.liljeberg.io\/#\/schema\/person\/e2c3fe10971c37cff2669f5688834cd7","name":"Daniel Liljeberg","image":{"@type":"ImageObject","inLanguage":"sv-SE","@id":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/DanielLiljeberg.png","url":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/DanielLiljeberg.png","contentUrl":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/DanielLiljeberg.png","width":424,"height":440,"caption":"Daniel Liljeberg"},"logo":{"@id":"https:\/\/daniel.liljeberg.io\/wp-content\/uploads\/2020\/12\/DanielLiljeberg.png"},"description":"Agile practitioner and advocate. Strong believer in the future of agile organizations, businesses and teams. Got my first computer, a C64, at age 7 and computers has been part of my life since then. Working professionally with development since the early 2000\u2019s in a vast array of technologies and roles. Social, easy going, fun loving guy with an appetite for new challenges and new knowledge who has been \u201cthere\u201d and done \u201cthat\u201d. That\u2019s a good way to sum it all up. Married and father of three kids. All true blessings ;)","sameAs":["https:\/\/daniel.liljeberg.io","https:\/\/www.linkedin.com\/in\/danielliljeberg\/"]}]}},"_links":{"self":[{"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/posts\/176","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/comments?post=176"}],"version-history":[{"count":3,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/posts\/176\/revisions"}],"predecessor-version":[{"id":451,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/posts\/176\/revisions\/451"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/media\/178"}],"wp:attachment":[{"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/media?parent=176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/categories?post=176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daniel.liljeberg.io\/sv\/wp-json\/wp\/v2\/tags?post=176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}