this post was submitted on 19 Jan 2025
7 points (88.9% liked)
Machine Learning | Artificial Intelligence
983 readers
1 users here now
Welcome to Machine Learning – a versatile digital hub where Artificial Intelligence enthusiasts unite. From news flashes and coding tutorials to ML-themed humor, our community covers the gamut of machine learning topics. Regardless of whether you're an AI expert, a budding programmer, or simply curious about the field, this is your space to share, learn, and connect over all things machine learning. Let's weave algorithms and spark innovation together.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yeah, that's looking a lot better. Too high learning rate is something I usually expect to see represented in an erratic training curve rather than (or I guess in addition to) an erratic validation curve though.
The learning rate is basically a measure of how big a step you're going to take when trying to update your weights. If it's large you'll approach solutions quickly but are likely to overshoot, if it's small you'll approach slowly and may end up stuck in a local minima as the step might be too small to leave it so every attempt will always be reversed by further training. IIRC, you had a decay to start high and get lower, which tries to get the best of both worlds, but it may have been that it started too high and/or didn't reduce quickly enough. The "steps" parameter there is counting in batches of images, so you're probably not getting much movement in 6 epochs. It looks like changing the initial rate solved your problem though, so there's not much reason to try to tweak that. Something to keep in mind for future efforts though.
And yeah, I wasn't looking at the unsmoothed data. That is quite a lot of variation.
Got it. Thanks so much for your help!! Still a lot to learn here.
Coming from a world of building software where things are very binary (it works or it doesn't), it's also really tough to judge how good is "good enough". There is a point of diminishing returns, and not sure at what point to say that it's good enough vs continuing to learn and improve it.
Really appreciate your help here tho.
No problem, happy to help. In a lot of cases, even direct methods couldn't reach 100%. Sometimes the problem definition, combined with just regular noise in you input, will mean that you can have examples that have basically the same input, but different classes.
In the blur-domain, for example, if one of your original "unblurred" images was already blurred (or just out of focus) it might look pretty indistinguishable from "blurred" image. Then the only way for the net to "learn" to solve that problem is by overfitting to some unique value in that image.
A lot of machine learning is just making sure the nets are actually solving your problem rather than figuring out a way to cheat.