I built a piece of code to generate imaginary French words. The idea is simple:
I used a database of French books and movies subtitle available here (142,362 words).
The only parameter is N. As N increases, the generation becomes more constrained; words start to sound more French, but a higher proportion of them end up being existing French words.
We can start to find French sounding words from N=3: sairions, bouvirents, talpottes, musiant, plamusse.
At N=5, about half of the words are actual French words, but the rest is imagined: paraignes, lamistes, embalisme, racinations, paraphie, fallotaient, amarcoussiens.
The code can be easily extended to other languages provided that you have a dataset of words from that language (the code’s readme lists the steps).