DuckDuckGo Ups Ante: Gives $300K to 'Raise the Standard of Trust'
For the seventh year in a row, the search engine that promises not to stalk your online moves puts its money where its mouth is, this year by donating $300,000 to organizations that
System76 Saying Goodbye to Bland Design
Considering that System76 chose to unveil its new design plans to The Linux Gamer -- no invite went to FOSS Force, BTW -- we can't help but wonder if a System76 Steam Machine isn't in the works.

The Screening
The Great Debian Iceweasel/Icedove Saga Comes to an End
Now that Thunderbird is back in the Debian repositories, the decade long dispute that led to all Mozilla products in Debian being rebranded has ended.

The hatchet is finally completely
Back Yard Linux
It's not as lonely being a Linux user as it once was. These days you're liable to find people throughout your neighborhood using Linux.

My how times have changed.

It wasn't long ago that Linux
No, Evil Hackers Aren't After You
Humankind has outgrown the need to have monsters hiding under our beds. Now we let them hide in our phones, computers and microwave ovens.

Roblimo's Hideaway

OMG! I think I see a giant camera lens on
Should the U.S. Army Have Its Own Open Source License?
Should the U.S. armed forces begin releasing software under an OSI approved open source license rather than as public domain?

Roblimo's Hideaway

This question has generated many pixels'
GitHub CEO Chris Wanstrath on Open Source
Did you know that the software Stephen Hawking uses to speak is open source and that it's available on GitHub? Neither did we.

The Screening Room

At the Computer History museum, GitHub CEO Chris
April 2nd, 2014

Open Source Project Brings 11th Century Kannada Verses Online

Vachana sahitya is a form of rhythmic writing in Kannada poetry that evolved in the 11th century C.E. and flourished in the 12th century as a part of the Lingayatha movement. More than 259 Vachanakaras (Vachana writers) have compiled over 11,000 vachanas. 21,000 of these verses which were published in a 15 volume set, “Samagra Vachana Samputa,” by the Government of Karnataka, a state in South West India, have been digitized. Two Wikimedians along with Kannada linguist and author O. L. Nagabhushana Swamy are involved in the Unicode conversions, corrections and writing the preface for these verses. The entire work is now available as a standalone project called Vachana Sanchaya and ready to enrich Kannada WikiSource.

Palm Leaf Vachanas

Palm leaf of 11th and 12th Century with Vachana poems in Kannada language

This project was started a year ago when Kannada Wikimedian Omshivaprakash was trying to help Professor O. L. Naghabhushana Swamy and Kannada author and publisher Vasudhendra to easily access the vachana (verses) of Vachana Sanchaya. Swamy had challenges in using publicly available content on Vachanas since the data was in ASCII and searching text was a huge problem. Pavithra Hanchagaiah started helping to collect information about about vachanas and document them into Unicode by writing scripts to customize open source software to convert the Kannada fonts from ASCII into Unicode.

Kannada Language project

Pavithra Hanchagaiah and Omshivaprakash H L

After further discussions, it was decided to get thousands of vachanas into a database, making them easily searchable with an index. This required us to build a platform on which this could be done. The fruits of our labors will help linguistic researchers and students as well as the public at large, anybody who’s interested in reading and studying Vachana literature.

With this idea, Omshivaprakash started designing the model and his colleague Devaraju started building it. In the meantime, Pavithra was running various scripts to fix errors in the conversion of the ASCII text to Unicode, confirming that the data was ready to be consumed by the modules developed for the concordance. We spent weekends and holidays executing this project from home and would sync up once in a while online.

With constant feedback and guidance from Mr. Swamy and Vasudendra, we learned how a concordance of text is used by researchers and what would make it easier for them to do their research. Omshivaprakash worked on the architecture of the platform, decided the infrastructure requirements and managed the entire project. Free and open source software technologies were used for keeping the platform active. Pavithra was involved in providing critical hacks for digitization and offered valuable input through suggestions, feedback and Q&A.

Working system

At present, the system has around 200,000 unique words in the repository. It was an extensive learning process, as we used our free time to solve real time issues. Moreover, it was a work of the Kannada language that needed quick attention. Vachana Sanchaya is meant to be more than just a repository of the text online; it’s meant to be a tool for researchers.

For example, as a user searches the words on our system, he or she can see who has used the word in which Vachanas. To improve readability, the searched text string is highlighted in each Vachana that is displayed. To repeat the search for a specific Vachanakaara, the user needs only to click on his or her name on the graph provided on the result page. We have used the MediaWiki jquery-ime input tool architecture that helps us provide the user with the ability to directly enter Kannada text in Unicode for a search.

Public Response

We are glad to see people accessing vachanas from our Facebook, Twitter and Google+ channels. Thousands read them every day and it has become a part of many people’s daily routine. There have been more than 50,000 page views on social networks and 500,000 page views on our site in the first few months after our platform’s public launch. Some of the most commonly searched Kannada words are “ಕರ್ಮ”(Karma en: Work/Deed), “ಸತ್ಯ” (Sathya en: Truthfulness) and “ನದಿ” (River).

ಆಂಗೀರಸ, ಪುಲಸ್ತ್ಯ, ಪುಲಹ, ಶಾಂತ,
ದಕ್ಷ, ವಸಿಷ್ಠ, ವಾಮದೇವ, ನವಬ್ರಹ್ಮ, ಕೌಶಿಕ, ಶೌನಕ, ಸ್ವಯಂಭು, ಸ್ವಾರೋಚಿಷ, ಉತ್ತಮ, ತಾಮಸ, ರೈವತ, ಚಾಕ್ಷಷ, ವೈವಸ್ವತ, ಸೂರ್ಯಸಾವರ್ಣಿ, ಚಂದ್ರಸಾವರ್ಣಿ, ಬ್ರಹ್ಮಸಾವರ್ಣಿ, ಇಂದ್ರ ಸಾವರ್ಣಿ ಇವರು ಇಪ್ಪತ್ತು ಮಂದಿ ಪ್ರಪಂಚ ನಿರ್ಮಾಣ ಸಹಾಯ[ದ]ವರು. ಹತ್ತೊಂಬತ್ತು ಎಂದರೆ ಪುಣ್ಯನದಿಗಳು. ಅದು ಎಂತೆಂದಡೆ: ಗ್ರಂಥ

— An example of a vachana from the Vachana Sanchaya project.

Plans for the future

Our system is extensible with respect to adding new features. We have a review desk for researchers to help with the review of content. Later we will be adding required references to Vachanas from various research works on this literature. The content is available for the public through OpenData API and will be distributed in the public domain through WikiSource once the review work is complete. This will open up the system for students, developers, researchers and anyone interested in working to build linguistic tools for Kannada and other Indic languages.

This system will evolve so it can be used for other literature projects. Vachana Sahitya will further help us to initiate Natural Language Processing (NLP) projects if more researches get together to tag the words, glossary, etc. We can also add various language tools such as a spell checker and grammar checker through crowd-sourcing development. The forthcoming project under the “Kannada Sanchaya” are Sarvagnana Vachanagalu and Dāsa Sanchaya which are already in the pipeline. Our idea is to extend this platform to include works from antiquity (Vyasa, for example) to the early 20th century (e.g., Muddanna) and possibly even include contemporary literature that’s available in the public domain.

4 comments to Open Source Project Brings 11th Century Kannada Verses Online