Dani's Japanese Resources

Contents

Introduction

This website is a space to put any useful tools I come across or create over the course of my efforts to learn Japanese. There was a blog here once, but I never updated it, and I realised that I had little interest in maintaining a public log of my progress, nor do I expect many others have any interest in reading it. However, once or twice I did receive feedback from people who had actually made use of the stuff I posted, so I thought it would be a shame to take that down... hence this website.

For anybody interested, I can be found on the AJATT+ forums (danielpwright), on twitter (@danielpwright), and occasionally on the #ajatt channel at irc.rizon.net (dpwright). I can also be contacted by email at dani@dpwright.com.

Motivation

Motivation can be difficult to maintain. Sometimes it helps just to be able to track your progress, to which end I created the following tools.

Kanji Desktops

Taking cue from Kanji Poster, I created these desktop wallpapers to keep track of my progress through the first volume of Heisig. They contain all 2,042 characters from "Remembering the Kanji I", in the order they are presented in the book. All I would do is highlight the kanji I had studied each day. Simple but effective.

The images were created in the GIMP and are provided in .xcf and .png formats.

RTK Desktop
						(Standard) RTK Desktop
						(Marked) RTK
						Desktop (Inverted)
Source XCF

Wordpress Heisig Tracker

Wordpress Heisig Tracker
		(Screenshot)

Back when this site was still a blog I wrote a little Heisig tracking plugin for it. I have no idea whether it still works with the latest version of wordpress, and the whole thing was just thrown together in a couple of hours so it's a bit rubbish, but for the sake of posterity here it is.

Spaced Repetition System (SRS)

Spaced Repetition Systems such as Anki and Mnemosyne have become increasingly popular in recent years. The basic principle is that the best time to review any fact you wish to retain is when you are just about to forget it, so the software schedules reviews so that they occur at that time. For a better explanation, take a look at this article. You can also find a lot of information at the website for Supermemo, the Granddaddy of SRS systems.

Card Layouts

While the technology underpinning an SRS system's review spacing is certainly impressive, it is up to you to create meaningful flashcards for review. Without good card design, reviews quickly become painful and difficult.

Finding the best layouts to suit what it is you want to learn is a process of constant experimentation. At first it seems simple, but you soon realise there's a huge number of innovative ways you can approach flashcard design to make reviewing more enjoyable and learning more efficient.

Here I present any new layouts which I have found useful. There are many more examples elsewhere; take a look at lazy kanji (or the related Kendo mod) and MCDs (AJATT+ membership required) for more ideas. Some will suit the way you learn; some will seem tedious, abhorrent, or pointless. The point is to experiment and stick with the ones you like.

Single Phrase Highlight

I originally suggested this MCD-inspired format on the AJATT+ Forums, which is where I throw out a lot of my ideas if they're still in the experimentation phase. Much of the text below is reproduced from my original forum post. I believe some people have used this format with some success; others have found the original MCD format suits them better.

Abstract

When using real-life source materials such as books or magazines in order to learn, it is often difficult to find i+1 sentences -- that is, sentences in which there is only one word or grammar pattern which you do not understand. Dictionary definitions of individual words are often shorter and more comprehensible, but tend to be somewhat dry and uninteresting by comparison. Sentences from natural Japanese sources are also generally longer than is ideal.

A format was required which would allow the use of long or otherwise daunting text in study. This format achieves this by creating multiple cards for each extract, and testing only individual components on each card.

Format

The format is quite simple. Multiple cards are created for a particular sentence or extract, and on each card a single word or grammar point is highlighted. The task is to be able to pronounce and understand the function of the highlighted section. The following examples are taken from the book, 「燃えるジンバブウェ

Card 1
Front

ジンバブウェ問題の一つの核心は、白人大農場制とその改革をめぐる紛争である。

Back

ジンバブウェ 問題[もんだい]の 一[ひと]つの 核心[かくしん]は、 白人[はくじん] 大[だい] 農場[のうじょう] 制[せい]とその 改革[かいかく]をめぐる 紛争[ふんそう]である。

核心
物事の中心となる大切なところ。中核。


Card 2
Front

ジンバブウェ問題の一つの核心は、白人大農場制とその改革をめぐる紛争である。

Back

ジンバブウェ 問題[もんだい]の 一[ひと]つの 核心[かくしん]は、 白人[はくじん] 大[だい] 農場[のうじょう] 制[せい]とその 改革[かいかく]をめぐる 紛争[ふんそう]である。

大農場制
Large-scale industrial farming


Card 3
Front

ジンバブウェ問題の一つの核心は、白人大農場制とその改革をめぐる紛争である。

Back

ジンバブウェ 問題[もんだい]の 一[ひと]つの 核心[かくしん]は、 白人[はくじん] 大[だい] 農場[のうじょう] 制[せい]とその 改革[かいかく]をめぐる 紛争[ふんそう]である。

改革
If you want to achieve the 改 of your inner self you have to control your desires and correct your flaws; in sum, be your own self taskmaster.

Although only the highlighted word is being tested, the reading for the entire extract is included on the back of the card. In use, I found that if I couldn't remember another word in the extract I would want to check the reading to satisfy my curiosity. However, it is important that the card is scored based on knowledge of the highlighted section and nothing else.

What to put on the back of the card is up to the user. I use monolingual definitions where I can, but if that's difficult I will just use an English translation. Sometimes I understand the word from the kanji alone, in which case I put in the Heisig mnemonic for that kanji. Examples of all three are given above.

Advantages
Potential Disadvantages

Vocabulary Oriented Kanji

Having completed Heisig's Remembering the Kanji I, I am now faced with the task of maintaining the kanji knowledge I gained through his system. Of course, one approach is just to continue reviewing the SRS deck which I created while I was working through it (which used the vanilla Heisig format -- keyword on the front; story and kanji on the back). I haven't phased this format out, but I am finding that as I move away from studying individual kanji Heisig-style and toward studying actual Japanese vocabulary my needs are shifting. This format is one attempt to meet those needs.

Abstract

Heisig's method is an extremely effective way to learn the meanings and writing of individual Kanji as they are used in Japanese. There is a missing link, however, between being able to recall a kanji from its Heisig keyword, and being able to recall it when required in actual Japanese writing. Synonymous keywords cause problems, and remembering the order in which a two-kanji compound is written is sometimes difficult. This format attempts to bridge this gap by testing individual kanji as components in actual Japanese vocabulary, while maintaining the link to Heisig's mnemonic system on the back of the card.

Format

The format is a simple clozed-deletion based format. The front of the card contains a word with one or more kanji clozed out, and the reading for the entire word. The task is to write out the missing kanji. The back of the card contains the whole word, and the keywords and Heisig mnemonic for each of the tested kanji.

Card 1
Front

[...]献
こうけん

Back

Taxes used to be called tributes; a crafty way to collect good men's shells if ever I heard one!


Card 2
Front

貢[...]
こうけん

Back

A South Korean street vendor is offering chihuahua snacks.


Card 3
Front

有限[...]
ゆうげんがいしゃ

Back

有限会社

The only thing created in most meetings is a lot of hot air. Imagine a cloud of it rising to the umbrella which serves as a roof.

When you enter a Japanese company it's like you join a cult... your workdesk becomes your altar and you worship the very soil the office is built on.

While these may look like simple vocabulary cards which are generally frowned upon because of the lack of context, there is a crucial difference: These cards don't require you to display any understanding of the word being tested in order to mark them correct. In reality, of course, remembering which kanji should apply without understanding the word being tested would be difficult and inefficient; however the understanding which is required is general and not specific. This format is designed to test ability to write kanji and not knowledge of vocabulary.

Advantages
Potential Disadvantages

Miscellaneous Tools

Tools are useful, but it's all too easy to fall into the trap of searching for or building tools in order to avoid actually studying! I am certainly guilty of this, so if you find yourself falling into this trap, why not see whether I've already made the tool for you?

Useful Scripts

I make a lot of use of scripts. Since I use a mac at home and work in a UNIX environment at the office, I generally have access to reasonably capable scripting languages. As a result, scripts below will tend to be written in bash or similar languages. Windows users can run them through the use of UNIX emulation layers such as Cygwin.

bash: Batch Rip Audio from Video

This script was originally posted on the AJATT+ Forums. It rips audio from a folder of video files, so that I could listen to my favourite dramas on the go. The script itself is written with my own system's directory layout in mind, so it may well require tweaking before you can use it. It requires FFmpeg.

#!/bin/bash

DRAMA="$1"
DIR="../ドラマ/$DRAMA"
EXT=avi

touch tmpscript.sh
chmod +x tmpscript.sh

find $DIR -name "*.$EXT" | while read FILE
do
	LOCALFILE=$(echo $FILE | sed "s:$DIR:$DRAMA:")
	LOCALFILE=$(echo $LOCALFILE | sed "s:.$EXT:.mp3:")
	FILENAMEONLY=$(basename "$FILE")
	mkdir -p "$DRAMA/$(echo $FILE | sed "s:$DIR/::" | sed "s:$FILENAMEONLY::")"
	echo ffmpeg -i \"$FILE\" -acodec libmp3lame -ab 128k \"$LOCALFILE\" >> tmpscript.sh
done

./tmpscript.sh
rm tmpscript.sh

bash: Word Frequency Analysis

Performs word frequence analysis on some text. Requires Mecab and iconv. Original file. Sample output. TODO More detailed writeup.

#!/bin/bash

WORDS="$(mktemp)"
UNIQ="$(mktemp)"
COUNTS="$(mktemp)"

cat /dev/stdin | iconv -c -f UTF-8 -t EUCJP | mecab | iconv -c -f EUCJP -t UTF-8 \
               | sed '/EOS/d' | sed '/^$/d' | awk -F ',' '{ print $7 }' > $WORDS
cat $WORDS | sort | uniq  > $UNIQ

cat $UNIQ | while read LINE
do
	FIRST=$(cat $WORDS | grep -n --color=never "$LINE" | head -n 1 | sed 's/:.*//')
	COUNT=$(cat $WORDS | grep -c -x "$LINE")
	echo -e "$COUNT\t$FIRST\t$LINE" >> $COUNTS
done

sort -n -r $COUNTS | cut -d"	" -f1,3

bash: Vocabulary/Sentence Deck Generator

Takes in a vocabulary list (such as the one generated by the Word Frequency Analysis script above) and outputs a tab-separated value file, containing the words in the list along with their readings, definitions, and example sentences from a source input by the user. Requires iconv, mecab, kakasi, and jmdict. There's some funny characters in this one related to parsing grep's colour output, so it might be worth downloading the original file from here. Also, it's not very portable at all; the scary looking sed line at the end of the DEFINITION variable and the colour handling will probably need to be tweaked on BSD/OSX systems.

 #!/bin/bash

SENTENCE_SOURCE=$1
WORDS="$(mktemp)"
UNIQ="$(mktemp)"
SENTENCES="$(mktemp)"
EXAMPLES="$(mktemp)"
TMP="$(mktemp)"

cat "$SENTENCE_SOURCE" | iconv -c -f UTF-8 -t EUCJP | mecab | iconv -c -f EUCJP -t UTF-8 \
               | sed '/EOS/d' | sed '/^$/d' > $WORDS
cat $WORDS | sort | uniq  > $UNIQ

cat "$SENTENCE_SOURCE" | sed 's/。/\n/g' > $SENTENCES

cat /dev/stdin | while read WORD
do
	if [ "$WORD" != "*" ]
	then
		READING=$(echo $WORD  | iconv -c -f UTF-8 -t EUCJP| kakasi -JH | iconv -c -f EUCJP -t UTF-8)
		DEFINITION=$(jmdict -j $WORD | grep -v "match(es) found" | sed "s:) (.*$:):" | sed -e '{:q;N;s:\n:<br />:g;t q}')
		MEANING=$(echo $DEFINITION | sed 's:.*1) ::' | sed 's:<br.*::')
		echo -e -n "$WORD\t$READING\t$MEANING\t$DEFINITION"

		cat /dev/null > $EXAMPLES
		grep ",$WORD," $UNIQ | sed 's/\t.*$//' | while read CONJUGATION
		do
			grep --color=always $CONJUGATION $SENTENCES >> $EXAMPLES
		done

		cat $EXAMPLES | sort | uniq | awk '{print length, $0}' | sort -n > $TMP
		cat $TMP | awk '$1 >= 50' > $EXAMPLES
		cat $TMP | awk '$1 < 50' >> $EXAMPLES
		cat $EXAMPLES | awk '{$1=""; print $0 }' | head -n 5 | while read EXAMPLE
		do
			echo -e -n "\t$EXAMPLE" | sed 's:\[01;31m\[K:<font color="red">:g' | sed 's:\[m\[K:</font>:g'
		done

		echo
	fi
done