https://googleresearch.blogspot.com/200 ... o-you.htmlThat's why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.
Kosten: 150 US$
https://www.ldc.upenn.edu/Catalog/Catal ... LDC2006T13
Gruß
sean