They're crawling as much publicly accessible source code as they can find, including archives (.tar.gz, .tar.bz2, .tar, and .zip), CVS repositories and Subversion repositories. Google Code Search respects robots.txt, so there are a couple ways you can block them from crawling your code.
They do their best to determine the software license for code packages by looking for a license in the comments or in a separate license file. If they can't find a license, they indicate that the license is "Unknown." But the Code Search results can't tell you what patents may cover a piece of software.
Learn more about Google Code Search at Peter Zura's Two-Seventy-One Patent Blog and their discussion group.
And, if you have not seen it already, Google Blog Search include all blogs, not just those published through Blogger. It is continually updated, and you can search not just for blogs written in English, but in French, Italian, German, Spanish, Korean, Brazilian Portuguese and other languages as well.