RedStrong Red (default) BlueCalm Blue GreenFresh Green

Chose your color scheme.

Geeky thought about an Irish-English/English-Irish dictionary

Posted on February 7, 2007, by Conor O'Neill, under Commentary, Technology.

Our start-up LouderVoice got a nice mention in the Irish-language daily newspaper . We were at an un-conference called BarCamp Ireland South-East in Waterford and Conn Ó Muíneacháin did a nice write up about it. Due to my desperately rusty Irish, I had to make use of this online dictionary to translate some of the words. Whilst it is a great resource, it seems to be lacking a huge number of words and also only seems to handle the roots of of words (or whatever it is called i.e it’ll find léirmheastóir but not léirmheastóireachta).

So the thought struck me this morning that a dictionary is the ideal target for a community effort, like Wikipedia is for encyclopedias. In fact, some sort of custom wiki might be ideal as a base for generating such a dictionary.

It could start very simply with one page per letter of the alphabet and as it grows, new pages are generated to sub-divide it. A wiki is very simply a web-site with pages that anyone can edit. The idea being that if you went to the wiki-dictionary to find the translation of a word and you spotted other ones that were missing, in a few simple clicks you could add them.

Of course it would need an agreed style of entry so it looked consistent but I’m sure there are plenty of language experts who could come up with a few simple rules for that. It would also need moderators to keep it clean and delete spam etc. And the search function would have to be very good.

For the laugh, I’ve just created a sample of what that might look like over at pbWiki. It took all of ten minutes to do and is called focloir.pbwiki.com. When Google makes JotSpot free, that might be the ideal location for such a resource since Jotspot is much easier to use than most wikis.

So what do people think? A bit too techie? A bit too idealistic? No need?

Technorati Tags: , , , , , , ,

32 Replies to "Geeky thought about an Irish-English/English-Irish dictionary"

gravatar

Rambling Man  on February 7, 2007

hi conor - i think thats a great idea - here in work we have already created such a dictionary and have got an eGovernment Award for it. Its done in .net etc. but we’d be willing to give you a dump of the 8,000+ records. It contains not just words and phrases but sentences and frequently used terms, signage etc. some of it has a local government flavour but it might be a start ?
one thing we found was it had to be strictly controlled vis-a-vis adding stuff or categorising things.

gravatar

conor  on February 7, 2007

That sounds like a wonderful resource and what a great offer!

Let’s see if we get any bites and if people like the idea, we should be able to take those records and do some scripting to pull them into a wiki (or similar).

I think it would be an awesome start if it did get support.

Interesting point about “strictly-controlled”. The idea with a wiki is generally it is very uncontrolled and just requires some monitoring for flame-wars or spamming. But I guess if the layout was put in ahead of time then it would just require people to add content based on their personal knowledge.

Let’s see what other feedback the idea gets.

gravatar

Brendan  on February 7, 2007

Conor - I was just about to reply to say that you’d need it kick-started with initial content, and then I saw Rambling Man’s comment. This idea has legs (and if I knew the plural for cos I might even have been able to say that in Irish. Doh!)

gravatar

James Corbett  on February 7, 2007

Great idea and I’m sure my Irish is even rustier than yours! Just wondering though if this should/could be part of the Wiktionary.org project?

gravatar

conor  on February 7, 2007

I’ll check that out James. I literally had the idea and blogged it immediately without checking what else was out there. Bloody Bloggers eh?

gravatar

James Corbett  on February 7, 2007

Heheh… bloody know-it-alls (or actually Google-it-alls ;-)

To be honest I’d only heard about Wiktionary before but never looked it up until today. It seems to be lacking a Gaelic entry at the moment or else I just can’t figure out how to find it.

gravatar

conor  on February 7, 2007

Interesting that none of the Gaelic languages are in Wiktionary.

However it looks fantastic. See this example in French

They’ve obviously got the structure in there already that I was talking about.

My two main concerns are [a]it is MediaWiki which is really really un-user-friendly for non-techies and [b]would it require all of those details for every word or could you start with basic translation and let others build the details?

gravatar

conor  on February 7, 2007

Just to give an idea about Mediawiki, here is the markup for “cow” in Wiktionary that someone would have to edit:

[[Image:Koe in weiland bij Gorssel.JPG|thumb|A cow (sense 1)]]
==English==

===Pronunciation===
*{{AHD|kou}}, {{IPA|/kaʊ/}}, {{SAMPA|/kaU/}}
*{{audio|en-us-cow.ogg|Audio (US)}}
*{{rhymes|aʊ}}

===Etymology 1===
{{OE.}} [[cu#Old English|cū]], from {{Ger.}} *”kūz”, from {{IE.}} *”gʷōus”. Cognate with Dutch ”[[koe]]”, German ”[[Kuh]]”, Swedish ”[[ko#Swedish|ko]]”; and, from Indo-European, with Greek ”[[βοῦς]]”, Latin ”[[bos]]”, Armenian ”[[կով]]”, archaic Russian ”[[говядо]]” and Latvian ”[[govs#Latvian|govs]]”.

====Noun====
{{wikipedia}}
{{en-noun|”’[[cows]]”’ ”or” ”’[[kine]]”’ (”archaic”)}}

# A female domesticated [[ox]] or other [[bovine]], especially an adult.
# More generally, any domestic bovine regardless of sex or age.
# The [[female]] of several species of [[mammal]], including bovines, [[whale]]s, [[seal]]s, [[manatee]]s, and [[elephant]]s.
# {{context|derogatory|informal}} A [[woman]] who is considered despicable in some way, especially one considered to be fat, lazy, ugly or spiteful.
# {{informal}} Anything that is annoyingly difficult.
#:”That website is a real ”’cow”’ to navigate.”

gravatar

James Corbett  on February 7, 2007

OMG! Point taken… let’s stick with the Peanut Butter!

gravatar

Damien Mulley  on February 7, 2007

You’ve seen Focail.ie? Wouldn’t it be great if we could have an API for that so we could submit words and phrases? They also have a limited amount of audio clips covering some words. Now that’s something that could really be expanded on.

A simple project to go through what is on the site and add in audio clips for all the words not covered.

gravatar

conor  on February 7, 2007

That’s why I’m thinking a WYSIWYG wiki like Jotspot with some simple templates would be far more likely to get traction in the non-tech community.

I wonder why Google does not give as high a billing to Wiktionary as it does to Wikipedia in search results? If I ever type “define blah” in Google it offers me lots of online dictionaries but I don’t remember ever seeing Wiktionary.

The way the have done it is very impressive and it’d be fantastic if they could wrap it in something more editor-friendly.

gravatar

Rebecca  on February 7, 2007

Great idea, our daughter started Gaelscoil last Sept, and with my rusty Irish and my hubbys non existant Irish, this would be fantastic resource

gravatar

conor  on February 7, 2007

Wasn’t aware of focail.ie! Doesn’t come up when you put “irish english dictionary” in Google. Therefore it doesn’t exist ;-)

These State and semi-State efforts are extremely important but my concern is that it immediately limits the resources available to generate the content whereas something like a wiki enables any Irish speaker with an internet connection to contribute.

I’d also doubt that any official effort would allow a free-for-all. I suppose a reasonable intermediate step would be that anyone can contribute but every entry is vetted before accepting. Then the people running a site like focail could focus on being moderators rather than creators?

gravatar

conor  on February 7, 2007

Just tried focail. It is far better than the one I was using. It was able to translate léirmheastóireacht.

gravatar

RamblingMan  on February 7, 2007

interesting post.

the database of stuff we have contains the following on each entry.

English / Irish word
Grammatical category eg. noun, verb
Context, sometimes giving an example

great idea conor - drop me an email (above) and let me know if it kicks off - glad to be of assistance

gravatar

Conn Ó Muíneacháin  on February 7, 2007

Wow! This is really cool! (fionnuar?)

Kevin Scannell is someone who might have valuable input into this kind of project. He’s done a lot of open source Gaeilge work and even developed a search engine optimised for Irish.

gravatar

conor  on February 7, 2007

Thanks for the links Conn. Looks like he has done some great stuff.

Are there any stats for internet or broadband penetration in Ireland amongst strong Irish speakers or Gaeltachts? I wonder how big a pool of people you could potentially expect to get editors from?

gravatar

aonghus  on February 7, 2007

Focail.ie is invaluable for new terminology, but it is not intended to be a general purpose dictionary.

The “technical resource” Michal Boleslav Mechura, behind it has started his own pet dictionary, but it’s still a work in progress (and he probably wont be too happy I’ve blown his cover)

http://mbm.dotnet11.hostbasket.com/potafocal/

There is also An Foclóir Beag online at ul:

http://www.csis.ul.ie/focloir/

(It has an annoying sort order bug - áóúíé come last in the sort order, so if you leave out a fada, its closest word algorithm fails)

cos [ainmfhocal baininscneach den dara díochlaonadh]
(cois in abairtí áirithe) ceann de na géaga a úsáidtear chun siúil; aon ní cosúil leis seo (cos boird, cos pota); clúdach coise (cos treabhsair); lámh, feac, sáfach (cos spáide, cos scine, cos casúir, cos fuipe); an chuid íochtarach (cos leapa).

ar cois (ar siúl, ar bun (tá rud éigin ar cois)).
cois (taobh le (cois na habhann)).
cos ar bolg (leatrom, éagóir, foréigean).
le cois (in éineacht le (tar le mo chois; cúpla punt le cois)).
Foirmeacha
cos - ainmfhocal cos [ainmneach uatha]
coise [ginideach uatha]
cosa [ainmneach iolra]
cos [ginideach iolra]

Collins Pocket dictionary is also online, and I think handles wildcards

http://xreferplus.unext.com/letter_picker.jsp?vol=365

And when the millenium breaks out, Foras na Gaeilge will eventually have its dictionary online.

http://www.focloir.ie

gravatar

RamblingMan  on February 7, 2007

Tosnaíonn an réabhlóid anseo !

gravatar

Kevin Scannell  on February 7, 2007

Great idea Conor.

Here’s what I would suggest — I’d do it myself if I could clone myself or else manage to work 200 hours a week :)

You could try putting together a kind of dictionary aggregator, a thin search interface that will pass queries on to englishirishdictionary.com, focal.ie, etc. Kind of like dogpile.com. You’d probably want to ask permission from the different sites first, but I think setting it up would be pretty easy. This way you’d avoid duplicating effort, and focal.ie wouldn’t have to worry about quality issues with users trying to add data to their site.

Then add on to this front end your wiki idea so users can add words when they’re not found in any of the dictionaries. You could even automatically keep track of all queries that returned no results, and post these as “most wanted” terms somewhere (based on my experience browsing the aimsigh.com logs, I’d advise that you filter this for 4 letter words)

I have open source software for “stemming” Irish words (the léirmheas(tóir)(eacht)(a) problem) that I’d be willing to provide to solve that problem.

gravatar

Conn Ó Muíneacháin  on February 7, 2007

I’d imagine Gaeltacht penetration/usage is the same as with any rural part of Ireland.

Irish speakers are everywhere. There’s lots of it on Bebo, for example. If there is a trend among people who use it online, it’s that a disproportionate number are outside the Gaeltacht and, even more so, outside Ireland. The farther you are from other speakers, the more likely you are to be a Net-Gaeilgeoir.

The sad thing is that you could share an office or live near someone with fluent Irish and never hear it. You could both be fluent speakers and not know it.

gravatar

conor  on February 7, 2007

Thanks for the info Aonghus. It looks like there are tons of resources and as Kevin says, maybe an aggregator plus a “hole-filler” would the the best approach.

The chances of me doing anything on this in the next few months are minimal as I’m flat out building up the léirmheasanna in LouderVoice :-) BTW, we have our first non-English review in the Beta Test site - it’s of No Béarla.

gravatar

aonghus  on February 7, 2007

Kevin also has a clever algorithm for recognising “old spellings” which aimsigh used well. Pity he’s moved on from there..

gravatar

Kevin Scannell  on February 7, 2007

Aonghus a chara,
Beidh sé ar ais roimh i bhfad - chaill mé mo phríomhdhiosca crua i mí na Nollag :(
Kevin

gravatar

aonghus  on February 7, 2007

Hmm. Cuirfidh tú feabhas ar do ghnás cúltaca amach anseo, gan amhras!

Tá áthas orm san a chlos. Cheap mé go raibh tú éirithe as ar mhaithe le tograí eile.

gravatar

Jim Normoyle  on February 8, 2007

Con -Not too techie, not too idealistic and a big need.A great idea, Think we should have audio clips as I think there is a big need in the areas of pronunciation and how words are pronounced in the three primary Gaeltacht regions. Another resource might be “Gleacht” which was basically the O’Donnell Dictionary. It was supposed to be updated to Windows -but it never happened!Inputs and editing would have to be closely monitored
Séamus

gravatar

elly parker  on February 8, 2007

How about trying to get some schools involved, see if they could add 200 words each as a class project or something? That way it would (hopefully) be supervised by a teacher and should be accurate?

gravatar

marcas  on February 10, 2007

Hey Conor, this is a great idea, I’ve read through all the comments above and there’s a lot of suggestions there… have you had any off-line contact with Conn or Kevin or Aonghus or Rambling Man, did you come up with a definite plan of attack? Whatever, I’d love to lend a hand.

Where I work in Frontier, there are four fluent Irish speakers, and 2 or 3 others (like myself) who make a stab and/or have Gaeilge interfaces on their PCs/Firefox/etc.

gravatar

conor  on February 13, 2007

Hmm, I think I might try that Gaeilge UI idea as a way of easing back in. Is it per PC or per user?

We’ve had no offline contact on this. Until I get a block of spare time (I’m figuring early 2008), I doubt I’ll get to do a thing on it. Of course if you wanted to run with it………

gravatar

aonghus  on February 14, 2007

http://gaeilge.mozdev.org/

http://ga.openoffice.org/

http://audacity.sourceforge.net/


http://www.microsoft.com/ireland/gaeilge/

Beidh ort d’obair bhaile fhéin a dhéanamh ina dhiadh sin, Conor!

gravatar

conor  on February 14, 2007

Yeah I’m a lazy swine. I’ll check out MS later.

I keep downloading Audactiy but never use it! Thinking of doing a podcast interview with an old relative talking about my grand-aunt who just died aged 95. Will need it then.

gravatar

John  on March 7, 2007

There was mention above of an Irish wiktionary…
http://ga.wiktionary.org/

Leave a Comment