|
::--ACTIVE RESEARCH PROJECTS--::
::DOCUMENT IMAGE PROCESSING::
Optical Character Recognition:
Optical Character Recognition (OCR) is a process to convert printed text Image to editable text that can be used and reformatted for other uses. We are developing a Bangla Character Recognizer that can recognize printed Bangla document and convert to editable text.
Project Page: http://www.bracuniversity.net/research/crblp/ocr
::SPEECH PROCESSING::
Imagine a day when you wake up, get washed and order your Personal Assistant (who most unfortunately is a machine) to make coffee and to read out the headlines from the newspaper.
Now, there are two basic hurdles we need to overcome. First, how do we make a computer understand what we are saying. And second, how do we make the computer say the things we want it to say. To solve the first problem we need speech recognition technology and speech synthesis for the second.
Speech Recognition:
Speech recognition is the process of converting a speech sound to a sequence of words, by means of an algorithm implemented as a computer program. We are currently working on Bangla speech recognition using Hidden Markov Model (HMM) as the technique and HTK as the toolkit. The performance of Isolated and Continuous Speech Recognition is tested for Limited vocabulary list using the HTK toolkit. Currently we are concentrating on the preprocessing issues and also looking for other tools for speech
recognition.
Project Page: http://www.bracuniversity.net/research/crblp/speech
Speech Synthesis:
Speech Synthesis is the artificial production of human
speech. A Text-to-speech system converts normal text to speech. We are
working on generating speech signal from Bangla Text.
Project page: http://www.bracuniversity.net/research/crblp/tts
::DOCUMENT AUTHORING::
TTF to Unicode Font Converter:
There are number of Ascii based Bangla fonts out there. The trouble with these fonts is that if the host machine doesn’t have the font installed, then the text gets jumbled up. We are working on a TTF to Unicode Font converter which will enable us to convert the Ascii encoded text to a Unicode enabled text. That way, we just need to have a Unicode Bangla font installed and we will be able to see the text properly.
Project Page: http://www.bracuniversity.net/research/crblp/converter
::COMPUTATIONAL LINGUSTICS::
Corpus Analysis & Corpus Collection:
We have developed a tool for extensive corpus analysis on word frequency distribution. We currently have a corpus of one-year of
Prothom-Alo newspaper text and Charjapad and Baru Chandi Das Er Kabbo. We
have analyzed our corpus for regularities and anomalies in Bangla Word Usage.
Project Page: http://www.bracuniversity.net/research/crblp/corpus
Localized URL:
As world researchers thinking and developing standard on localized domain name so why don’t we think about it? CRBLP team is working this project to develop a standard for domain name in Bangla.
Project Page: http://www.bracuniversity.net/research/crblp/localURL
Lexicon:
We need a rich and informative lexicon for any kind Bangla Language
Processing. We have developed a wordlist of 160 thousand words with 1st step
parts of speech tagging.
Project Page: http://www.bracuniversity.net/research/crblp/lexicon
Wordnet:
We need a rich and informative lexicon for any kind Bangla Language
Processing. We have developed a wordlist of 160 thousand words with 1st step
parts of speech tagging.
Project Page: http://www.bracuniversity.net/research/crblp/wordnet
Spell Checker:
Bangla phonetic spelling checker. Gives suggestion for misspelling words based on similarities in pronunciation. Implemented based on Double Metaphone phonetic encoding.
Project Page: http://www.bracuniversity.net/research/crblp/speller
Parallel Corpus:
The aim of this project is to make the multi-lingual Bangla and English parallel corpus in electronic and aligned form. The parallel corpus can be used for translation memory, cross-language information retrieval, multi-lingual lexicography, terminology extraction, language study and language learning tool.
|