In the works…

Submitted by David Snopek on October 30, 2007 - 7:33pm

I've had a number of language learning software projects in the works for a long time. Unfortunately, given my limited time, I've been struggling to get them "out of the works" in into a place where other people can use them. But until that time, I'd just like to describe what they are - or really what they will be!

Memorati™

Memorati™ is my main focus right now. It is an online flash card system based on spaced-repetition. It is designed to be rich and flexible using modern Web 2.0 techniques and easily embeddable inside of other applications. In fact, when you go to www.Memorati.org, you are actually seeing Memorati (the application) embedded into Memorati (the website).

There are still many features missing from Memorati and at this point should be considered Beta. In the future, I plan to include:

Plugins for domain specific cards, ie. geography cards could include maps.
Media! With a player for audio and video files.
Offline support, so you don't always need an internet connection.
User sharing and collaboration.
A cross-domain Javascript API, to allow interesting mash-ups.
And much more!

This is only the begining! But development has been slow considering the day job. One of my focuses in the near term is also trying to make the project more community accessible to attract new developers. If this is successful the pace of development could increase.

Lingwo™

Lingwo™ is a piece of code that I have poured hundreds of hours into, writing and re-writing several times over the past four years. Yet I've failed to ever release it in an end-user-able form.

Its a translating, morphological dictionary. A translating dictionary in the sense that it doesn't include full definitions, just glosses in other languages. And a morphological dictionary in that it understands the morphology of the language in question. I've spent way more time on the morphological aspect than the dictionary aspect.

But basically, it can be given an XML definition of morphology of the target language. In highly inflected languages (ie., Russian, Polish) this can be very complex but also very useful because there are many forms of each and you need to be aware of that when doing things like analyzing text.

Then in the entries for each individual word, you need only point out the exceptions, the areas where it differs from the regular morphology of the language as it was defined in the XML file (ie. the past tense of write isn't *writed, its wrote). For languages like English, this functionality is only marginally useful, but as a student of slavic languages, I've focused more on the needs of those languages.

One day this software will be put on a community website that will be like the Wikipedia of translating dictionaries. Users will be able to enter words they know with translations into their language. This information will be immediately entered into the database for others to use, although unlike the Wikipedia, it will be marked "unverified" until a trusted native speaker is allowed to review it. I consider this a good compromise. Errors in language can be deadly in the wrong context (ie. military), but the information is still immediately available with a warning.

An API will be provided to allow any other application to use the Lingwo data. I can foresee many uses for this: Automatic translation, automatic generation of grammar exercises, etc... Of course, it will be integrated into Memorati too! I envision being able to enter a word and have the flashcard automatically created from a user designed template. For example, the user wants the "key forms" of each verb and the gender of each noun. Bam! Looked up automatically in Lingwo. Or maybe just the exceptional properties of each word! Since Lingwo knows the grammar of the language, it can specifically point those out.

Anyway, thats whats in the works! I'm excited. Here's to hoping I can keep that excitement up through to finishing these projects!

Comments

This is fantastic idea,

Submitted by Dan Vanderboom on March 16, 2008 - 1:54pm.

This is fantastic idea, David, similar to one I've entertained for many years but have never gotten around to implementing. I've always wanted a repository like this to explore, and I've done extensive linguistics research over the years that I'd be happy to share with you if you're interested.

There's a language-learning application I've used called Before You Know it (http://www.byki.com/byki_descr.html) that is very similar to what you're doing, but it lacks strong community support (though people can create and share flash card sets). It's greatest strength is that it remembers how well you remember each word individually, and it adapts with weighted values, drilling you more on the words or phrases that you're weakest on. (Weighted value ranges are too short, in my opinion, and word list shuffling options not expansive enough, in my opinion, but still worth paying attention to.) It also provides four different types of flash card drills based on different levels of mastery (recognition vs. reproduction, etc). There's a normal desktop as well as a Windows Mobile version, which is nice and convenient to run off a smart phone and always have it with you.

If you haven't seen the Princeton project WordNet (http://wordnet.princeton.edu/), that's worth checking out, as it provides rich semantic mapping among words. You may be able to extract and use that data.

I *highly* recommend a book called The Loom of Language by Fredrick Bodmer (http://www.amazon.com/Loom-Language-Frederick-Bodmer/dp/039330034X),
not only for its deep insight into the crucial aspects of language learning, morphology, etc., but also because of its "Language Museum" in which a core set of the most important vocabulary is listed side-by-side in 10 different languages (Romance and Teutonic language families). The ambitious goal of the book is to teach many languages simultaneously.

Are you implementing anything in regard to Noam Chomsky's transformational grammar to aid in understanding or producing statements? If so, I have this feeling that a functional programming language would be excellent to model these transformations clearly and efficiently. It would be fun to experiment with that in any case. All of the natural language processing techniques I've used in the past have been imperative, so it would be interesting to see how much farther one could go with other methods.

Are you focusing only on individual words, or are you also identifying phrase "chunks" to recognize idiomatic expressions?

I'm looking forward to following your progress with this venture.

@Dan: Thanks for the

Submitted by dsnopek on March 17, 2008 - 3:42pm.

@Dan: Thanks for the comment!

I checked out byki briefly, just to see what it was, but never actually used it for real. I'd be interested to learn more about its spaced repetition algorithm, so maybe I'll take another. The algorithm in Memorati is based largely on jMemorize, which I used for awhile before deciding to create Memorati. I constantly feel like the space between repetitions needs to be longer although maybe thats just because I have ~2500 flash cards. ;-)

The Lingwo dictionary will also have an accompanying project, the Lingwo korpus. My main goals for these are (probably obviously): language learning and corpus linguistics.

I am focusing on individual words, but I'd like to include some "syntax detection" to go along with the corpus tools, so that the tagging isn't entirely manual. I haven't even begun to imagine how I'd implement that, though!

The morphology transformations in the current implementation of the Lingwo dictionary use a functional language that I made up (in previous implementations, I used XML and Python) that vaguely resembles Haskell's syntax. I have a Python interpreter, but I plan to generate Javascript code, so that it can operate in the web browser on the fly.

WordNet looks wicked awesome! I'm going to have to allocate some time to dive into that.

I'd love to talk with you more about your projects and all the rest of this some time!

Writing this comment also makes me realize that I should post more details about my projects, everything posted so far is terribly general. Once I get The Lingwo Project website online I'll start filling that up with some details.

In the works…

Memorati™

Lingwo™

Comments

This is fantastic idea,

@Dan: Thanks for the

Recent comments

Monthly archive

Subscribe to RSS