Tag Archives: progress report

Data mining the Asita-Subhadra connection: Progress Report 1

Out of 126 clusters, I have gone through the first 20, and found that 7 are chiasmically meaningful, which is much higher than I thought.

Even more surprisingly, the clusters threw up what I previously gleaned by reading and manually comparing those parts.

Here’s a preview of what I have gotten thus far.

ProgressReport1 - clusters

I understand that might not be all that intelligible, but here’s what I gleaned from those clusters.

ProgressReport1 - gleanings

In terms of providing different perspectives of what is chiasmic, I think this method is holding up very well so far, and has generated quite a few matches that I did not pick up on while reading these two sections. (what I did pick up on is briefly presented here)

The most interesting would probably be the one found in cluster 4 – the Buddha was born to end birth. How strange that seems! Have you ever heard of someone doing something in order to stop doing that very same action?

Discussion with Eric brought to my mind how eating can be considered an example of this. We eat to starve off hunger so that we do not have to eat further, but, as Eric quickly points out, that is only a temporary relief. Perhaps that’s the importance of the Buddha’s endeavour – it is said to be the ultimate relief.


State-of-my-data-mining-exploits report

I haven’t devoted much time to trying to data-mine myself into some new understanding of the Buddhacarita, but I’ve to talk about it tomorrow, and present my work to-date to my class tomorrow, so here’s a brief review.

1. My data comes from the Chinese version of the Buddhacarita found in the digital version of the Taisho Shinshu Daizokyo as completed by CBETA. This text is called 佛所行暫.

2. Arthur Chen, a Master’s candidate at the Department of Religious Studies of Hsi Lai University did me a great favor by converting the Chinese text into a tab-delimited file that indicates the location and occurrence of each character in the entire 60,000-character-long text. There are 1997 unique characters.

3. In preparation for a clustering analysis of the file, I have removed about a 100 unique characters because they are not quite meaningful in and of themselves. These include prepositions and conjunctions, etc.

4. Clustering analysis is used because there cannot be supervised learning (the characters cannot be classified into known categories). In this case, I am trying to explore if there are some unknown – and unexplored – connections between the characters that my current manual reading and interpretation of the text.

5. There are two possible directions to take here. One is to input an attribute indicating the location (in which chapter, for example) of each character and use clustering to tell me which locations (chapters) are more related to one-another. The other is to use my current understanding of the structure of the text and select material to be clustered, thereby zooming into particular sections and using the results from the analysis to inform my manual reading.