OpenAI O1 is Actually Bad At Writing Code

so open AI just released an brand new model and it's not like any other model it's meant specifically for reasoning and for tackling very complicated tasks like for example coding which needs a very thorough humanlike thinking and that's what the new L1 models are about and if you notice right over here in the benchmarks they are killing it so for example gpc4 here it's orange and the sky blue and here is the o1 and if you notice it's literally excelling like when it comes to math gbd 40 is like 60 the other one is 90 physics as well in here so anything like needs very complicated reasoning and a lot of thoughts going into you know the process of how the motor works it's literally excelling in those but obviously if you notice like you know obviously in here if you think more of like you know from a language perspective for example language English language in here it's literally the same so there isn't much improving in that regards because this model is all about reasoning and the most important part we care about as developers in here is actually how well this model is going to do compared to previous models or specifically compared to the GPD 4L in here or the claw 3.5 on it when it comes to coding and solving complicated tasks or like building real word applications because building an application needs reasoning when you give it the chain of thoughts where give you the instructions that it needs to follow it needs to go through a chain of thoughts and that's what the model is all about like making sure it goes through very humanlike chain of thoughts in order to properly and perfectly give you the right solution so if you notice in here in code forces which is the platform for competitive challenges and you know competitive code in the gp40 in here is like the 11th percentile DPT or 01 preview is 62 percentile and the 01 motor in here it is 89th percentile and this o1 motor in here is yet to be released it's not made public just yet we only have the M1 preview and the M1 mini which we're going to talk about in a second but still all of these in here are not made public all the benchmarks in here are not publicly made by open AI so we cannot make sure those are correct and valid benchmarks and results so maybe the the results are fake we we never know kind of things but what we can do is actually put those into the real test and the code forces platform is already putting restrictions in the usage of AI just like one day after the release of the 01 model in here and they putting like a whole thread in here of how you know to properly use your AI and when not to use AI cheating and this kind of stuff because apparently A1 is too dangerous with this kind of like competitive challenges and problem solving in general oh yeah I noticed a couple of people in here complaining that that the M1 motel in here is still struggling to solve this strawberry problem that gbt 40 and the rest of the models were struggling too but for me it works perfectly so let's say we want to test that problem so I'm going to use the 01 mini not the o1 preview and apparently the o1 mini is better than preview more on that in a second but I'm going to ask it this how many RS and I'm just going to put it like this in stro Berry Okay click enter in here and well the first thing you're going to notice about these two models that's going to take some time even though it didn't take time in here but the o1 minion here knows exactly that it has three letters most of the time it gets that mistaken for two letters instead of three letters I mean for me in here it's working perfectly with no issues so that's pretty good and the reason behind me say that 01 mini is better than the o1 preview even though like the o1 preview is more of like just the trial preview version of the official regular 01 which is yet to be released well I put a tweet about this one just a a couple hours ago where I was going through this you know code forces website and I was looking to their benchmarks for the new 01 so apparently when they did this coding kind of test or Benchmark between the you know 01 mini and 01 preview they found that 01 mini is like 6 1850 ELO rating compared to the 01 preview in here which is 1258 ELO rating and yes the strongest mod in here is like the o1 which is 1673 and there is another sort of like fine-tuned kind of like fine-tuned on on competitive programming sort of problems uh which basically like yielded almost like 1,887 sort of like ELO rating which is enormous and yes from my perspective as well for like when I tried this on the OPA router or like you know using the 01 mini versus1 preview always always found that 01 mini is better for some reason the o1 preview1 preview is just like very bad and it takes forever to return something so maybe just me but I notice a lot of people actually complaining about the same issue in here as well so yeah if you don't have access to the 01 mini or 01 preview or any 01 inste of your you know chat gbt sort of dashboard or playground what I personally love to use is the open rouer website which basically gives you all the models on hand and like literally just like choose the model you want and you can use it every single model from cloud GPT 4L you know L 31 400 anything you want um and the cool thing about this is actually you can put credits I've put $5 I don't know how longer ago that was I think that was two months ago it says in here like two months ago put $5 I've been using it on a lot of testing with different models and it just like still got almost $3 in here and it used a lot and those $3 I still pretty good for running a lot of like 01 sort of proms even though the 01 is ridiculously overpriced like ridiculously overpriced all right from a developer perspective to test this I want to actually you know put the capabilities of the L1 mini or preview however you want to use it into the Des by asking it to build real words for like apps and projects for example a landing page or maybe do authentication pretty much we tested this before using cursor AI but before the release of o1 like models we were using Cloud 3.5 Sonet inside of cursor EI but now cursor eii actually supports o1 mini so if I go to cursor settings in here if you go to like models you're going to find o1 mini and o1 preview both like available for you in here if you have like a cursor subscription or something you're going to have that right away which is amazing and I disable preview in here because it sucks you can steal whatever you want you still use whatever you want I'm going to choose one minion here over the others and I'm just going to go go ahead and put into the test so before we did a quick test using Cloud 3.5 on it and we asked it sort of like this prompt so if I'm going to use the composer that I already have in here so the composer I basically put this instruction where oh please go ahead and create a dark themed cool landing pages in shente with CSS uh the landing page is for like a new AI sort of like SAS to enhance images and apply files and I'm just giving it a bunch of like uh constraints and instructions on how to use it or how to create the landing page and stuff like that well the results from cloud 3.5 sunet using cursor AI as well and here using the new composer mode well actually decent not fascinating because UI is relatively very hard and this goes into creativity side sort of kind of thing um so this is actually what a year Lo for me like everything was put together by the AI by Cloud 3.5 Signet it's decent enough it can be used like a studing point and you can you know update this as you move forward um but right now I want to use it with the own1 mini if you notice like down here I don't know if you can see that on the camera you probably won't but it's selecting A1 minion here well this should be good enough so you can notice in here there is a One Mini right over here so I made sure to go back in here and reset the whole project here like I'm starting with a brand new nextg sort of project and I can go and use the new composer mode put the instructions I did before and I'm going to ahead and click submit in here and start the o1 miniature of like chain of thoughts and let it think about this and how to create this so hopefully it's going to yield something cool for us I know like the 01 family of models are sort of like computer intensive and they take quite some time to return something um apparently it didn't take that long to return this um I'm not sure if it's going to look good or it's going to look bad it's asking me to install react icons I think because it's using a couple of icons from yones in here which is understandable completely fine and I'm just going to go and do accept all so this should go ahead and add those right over there so I'm going to have SRC landing page uh so it has SRC then it has pages so it's not using the new up directory like the new Nexus up directory so this is a very important point in here it's using the old one and it just put in a l landing page it's not put in like homepage okay so um oh yeah and it's like empty I don't know what happened in there but uh maybe there is something happened I'm just going to do reapply accept just to make sure I apply this well there's clearly something wrong in here or maybe I'm wrong no I'm not or yeah wait a second no I'm not wrong it's putting it in a completely different spot it's putting it in a very weird spot so I'm going to go ahead and try to run this from scratch so second attempt in here and it looks like it's doing something completely different which is good so it's like changing the page. TSX the up page TSX which is the homepage and it's putting a new component in here which is the lending page it's like including that Lending P out of the here so that's that's good I'm going to go and try to accept all in here hopefully this will go ahead and apply all the changes to the page. CSX in here I don't know why it's still putting this in the SRC pages I just doesn't make sense for me but and he's trying to for some reason bring icons from here which is very stupid again I no this is very wrong I mean Cloud 3.5 son the previous like previously when I tried it it works flawlessly without any issue any works from the first try so yeah cool 01 very cool I love that I I love you already I think I'm going to go back to cloud or maybe just wait the official regular 01 kind of like model or maybe wait for improvements so I made some modification in here instead of like saying made cool or made a dark theme cool landing page I put homepage in here I'm just going to put home page using shsi and and T with CSS I'm literally keeping the same thing in here and I'm going to give it a third try hopefully this time works and I'm using the 01 mini so um I'm not sure should I just go and choose the preview maybe that would be better well anyway I'm I'm just going to give you a mini one more try in here and see how it goes well first impression in here it's still putting the landing page inside of like SRC Pages even though it doesn't exist you shouldn't been putting anything inside of there but that's fine we can fix that manually and he's putting this like correctly when it comes to icons he just using react icons so maybe I don't know I just want to look into that very weird homepage that it didn't want to actually like work or something so I'm going to move this uh from like SRC Pages I'm going to create a folder for I'm going to do it like components and I'm just going to put Landy page inside of that I'm going to move yeah it's not AI anymore if I have to do all the work in here but anyway so I'm going to delete SRC in here got landing page and yes I'm going to go ahead and do pmpm add react icon so make sure to install you know the missing Library all right cool so we install this uh hopefully you can get these errors resolved ASAP now we got this this is not how we need it uh I don't know if this is going to work all right it does work so that's good the only problem with these are still like says fat background and gallery and Bouquet they do not exist inside of here which I think it doesn't right cuz um yeah even though after installing it's still like nextg is failing in here miserably so yeah 01 is miserable so you know what I'm actually going to go inside the composer again put the same instructions I did before but this time I'm going to switch from the mini to uh the preview so 01 preview apparently it doesn't want to switch for some reason so thank you cursor I uh I love you already it's you're I don't know a lot of things are going wrong so yeah if you're worried about AI replacing you any soon I do not think so well correction definitely not doing that all right so I did reset everything I'm choosing the o1 preview right now before it wasn't working I'm going to go ahead and click submit hopefully it's going to get me something but all right usage based pricing is required I do okay so the low limits for the preview I literally didn't use it well okay this 01 thing is is just a mess all right anyway so the second test in here I want to run which I did run before using Cloud 3.5 sonm and using cursor AI in here if you want to check out the results I'm not going to go into details about that so if you want to check out the results you can go and watch the previous video you're going to find in my channel you're going to find a link in description below as well but the testing here is simply just creating a register and login pages with you know different description in here like Fields needs to be included like full name email password y y y and the most important part is it goes into like databases I going to create SQL like database when you use Prisma um and you tell them a specific instruction of how to do authentication and for example use next off how to do the database when to um use like secure cookies and GWT a lot of details so it needs a lot of reasoning a lot of sort of like information and it needs the model to probably like think like a real human being so I'm going to use1 minion here CU preview apparently is out of quota or something I haven't used it at all but whatever so I'm going just going click submit and um yeah see if this actually compares to the previous Cloud 3.5 Sonic sort of kind of like experiment or not so apparently in here it's providing you with all the required stuff it looks like well it uses or it gives you the Prisma uh next off in here sort of like Handler it gives you the register page register typescript in here for handling you know the HTP request like the post request in here there's login there's providers for session providers which is good and it puts that inside of the you know you layout sort of like the root layout in here which is also pretty good um for the description in here when it try to describe exactly what's happening from the right hand side it's like unlike Cloud 3.5 Sonic it's not giving you instructions on how to set up these kind of things because obviously now we're using new things like Prisma or using shaten in here maybe some components does like shien components I guess uh I don't know what if this package does even exist or not but uh it's not giving you instructions in here like what packages or libraries that needs to be installed first and how to run for for example uh Prisma initialization to create the database and migrate the full U you know database like the the initial schema in here a lot of stuff are missing in here so it's not perfect as previous but um you never know like until you run this and you make sure that it works but apparently for me as far as I can tell it's not going to run at all and it's still missing a couple things I think with the FR adjustments and manual modifications it would um probably this pack does not exist because as far as I know Shaden does not operate like this Shaden is more of like a copy paste component Library there's no official comp Library where you can import those I don't know if this is something you I don't know about but I'm pretty sure this is not how it works and the fact that it doesn't tell you all of these is just not working perfectly also the typescript stuff in here are not populated as well so the AI is not figured out typescripts types correctly as well so doing a very quick Google search about chatan components and just adding npm in here to know if this exists as a package or not apparently and obviously it does not chatsi and this is like this is not how Shen works so this is a very initial big brain fart that's all I can say and um yeah this is definitely not working expected and I'm definitely going back to Cloud 3.5 son it just right after this it's I don't know I don't coding wise doesn't look or feel pretty good for me I haven't tried a lot of reasoning sort of like tasks for it or something but from a coding developer perspective this is not perfect even the auto complete that cursor uses I find it a little degrading as well but um yeah I'll keep you guys posted on Twitter if you want to know more details about which one is which and which one is doing better but anyway guys thanks you for watching this was a quick video about how things actually going with the new 01 models is he going to replace you soon or maybe next year I don't know but keep an eye on this keep an eye on my Twitter maybe you get replaced very soon don't tell anybody but yeah anyway thank you guys for watching see you guys hopefully in the next ones

Share your thoughts

Related Transcripts

ChatGPT o1  Strawberry  OpenAI o1 : Un Bond Technologique Immanquable 🚀  Mini vs Preview thumbnail
ChatGPT o1 Strawberry OpenAI o1 : Un Bond Technologique Immanquable 🚀 Mini vs Preview

Category: People & Blogs

Chat gpt 01 est là pour révolutionner lia vous avez entendu parler du modal straberry d'openai c'est bien plus qu'une simple mise à jour imaginez nia capable de réfléchir avant de répondre de corriger ses propres erreurs et de rivaliser avec les experts humains dans les domaines les plus complexes comme... Read more

NEW: OpenAI o1 & o1 Mini vs Claude Sonnet 3.5 🤖🏆 Testing Which Model Is Best (o1-preview - PHD LLM) thumbnail
NEW: OpenAI o1 & o1 Mini vs Claude Sonnet 3.5 🤖🏆 Testing Which Model Is Best (o1-preview - PHD LLM)

Category: Entertainment

Introduction: testing openai o1 & o1 mini vs claude sonnet 3.5 what's going on everybody josh pokok here and in today's video we are going to be testing chat gpt open ai's new 01 and 01 mini versus claude sonet 3.5 we're going to see what models are the best these are the top frontier models right now... Read more

Is it too late to become a software engineer? (OpenAI o1 reaction) thumbnail
Is it too late to become a software engineer? (OpenAI o1 reaction)

Category: Education

Is ai about to steal your tech job is it too late to become a software engineer are we the last survivors of your dying species of software engineers open ai just dropped their most powerful model yet 01 and this ai is no joke it outperforms 89% of human coders in competitive programming the other 11%... Read more

OpenAI Strawberry o1 x Snake Game thumbnail
OpenAI Strawberry o1 x Snake Game

Category: Education

What game do you like the most snake oh nice let's um implement the snake which html gs css maybe let's use a wd to control snake all right so wow the model gives us a really long implementation of the snake game covery the code and um html okay we have a snake game it says press space bar to start... Read more

Unlocking AI's Potential: How OpenAI's New Model Uses Chain-of-Thought Prompting! thumbnail
Unlocking AI's Potential: How OpenAI's New Model Uses Chain-of-Thought Prompting!

Category: People & Blogs

Open a i just released their most advanced ai model yet it's called 01 and it's designed to think stepbystep just like a human to solve complex problems in science math and coding owan can perform at phd level tackling tasks that were previously to difficult for ai some key things that set 01 apart... Read more

open ai strawberry o1 leaked!!!! #OpenAI #SamAltman #OrionModel #GPT5 #O1Preview #ChatGPTPlus #AI thumbnail
open ai strawberry o1 leaked!!!! #OpenAI #SamAltman #OrionModel #GPT5 #O1Preview #ChatGPTPlus #AI

Category: Science & Technology

Absolutely amazing guys absolutely amazing this is the strawberry model that we have been waiting for and it is called open ai ov Read more

OpenAI Strawberry o1 X Transformers Explained ⚡️ thumbnail
OpenAI Strawberry o1 X Transformers Explained ⚡️

Category: Education

I sometimes teach a class on transformers which is a technology behind models like chpt and when you give a sentence to chat pt it has to understand the relationship between the words and so on so it's a sequence of words and you just have to model that and transformers utilize what's called the self... Read more

#Openai o1 Preview y o1 Mini son los nuevos modelos de #InteligenciaArtificial en #CHATGPT thumbnail
#Openai o1 Preview y o1 Mini son los nuevos modelos de #InteligenciaArtificial en #CHATGPT

Category: Education

Open eye lanzó un nuevo modelo de inteligencia artificial dice presentamos open o1 hemos desarrollado una nueva serie de modelos de ia diseñados para pasar más tiempo pensando antes que respondan es decir es diferente en el sentido que lo que se venía viendo en todos los modelos de lenguaje artificial... Read more

OpenAI Lança GPT Strawberry 🍓 Explosão de inteligência! thumbnail
OpenAI Lança GPT Strawberry 🍓 Explosão de inteligência!

Category: Education

Essa nova versão do chat gpt strawberry resolveu em 21 segundos todo um trabalho que eu estou há quro semanas fazendo nessa última semana de setembro de 2024 mais uma vez a open ai abalou os usuários da internet com o lançamento da sua mais nova versão do chat gpt também chamada de chat gpt strawberry... Read more

Introducing OpenAI o1-preview thumbnail
Introducing OpenAI o1-preview

Category: Science & Technology

Today we're diving into open ai's latest marvel project strawberry and the release of the 01 family of models there are two models the o1 preview which is a preview of what is coming and the 01 mini which is a smaller and faster model these models are designed to push the boundaries of reasoning math... Read more

Apple Intelligence VS Home Assistant - Wer hat die Nase vorn? thumbnail
Apple Intelligence VS Home Assistant - Wer hat die Nase vorn?

Category: Science & Technology

Apple hat auf der wwdc dieses jahr den vorhang fallen lassen und endlich ein update für siri vorgestellt das ganze nennt sich apple intelligence und soll zumindestens eine kleine ki sein die sowohl auf dem gerät als auch extern in der cloud laufen kann je nachdem wie komplex die sprachbefehle sind sollte... Read more

Nvidia se suma a la financiación de OpenAI junto a Microsoft y Apple thumbnail
Nvidia se suma a la financiación de OpenAI junto a Microsoft y Apple

Category: News & Politics

Envidia se enfoca en ganancias pero parece que podría invertir en open ai y de hacerlo se estaría sumando a apple y también a microsoft microsoft es el mayor patrocinador de open ai el cual invierte unos 1300 millones en la empresa envidia invertirá 100 millones así que está bastante lejos de eso la... Read more