Welcome to the MLBot FAQ (Frequently Asked Questions) page!

If the FAQ doesn't answer your questions feel free to email us at:





FAQ Contents

  1. What is MLBot?

  2. What is MLBot looking for?

  3. What are you doing with the data you collect?

  4. What IP addresses does MLBot crawl from?

  5. Who are you guys?

  6. I found a bug in MLBot!

  7. How do I block MLBot?

  8. Is MLBot available for licensing?

  9. Latest news

  10. BBQ or Tex-Mex?

  11. I have a question your FAQ doesn't answer.



  1. What is MLBot?

    MLBot is a focused web crawler. Unlike traditional crawlers such as GoogleBot (Google), MSNBot (Microsoft) and Slurp (Yahoo) that crawl for breadth, a focused crawler is designed to be highly selective. Focused crawlers prioritize the links they crawl based on the probability the links will lead to content that interests the crawler. Focused crawlers are more complex to create but in exchange they make better use of bandwidth and server resources.

    Interested in learning more about focused web crawling? The following resources are a great starting point.

  2. What is MLBot looking for?

    MLBot looks for links to videos from sources like YouTube, CNN, Vimeo and ESPN as well as mp3 audio files. It directs the crawl to favor newer media over older media.

  3. What are you doing with the data you collect?

    Glad you asked! We are creating two products: Podly, a real-time media service and Buzz Cruncher, an online media analytics tool.

    "Imagine a day when you would be in total control of creating your own TV channel lineup.

    Instead of subscribing to a service from a cable, satellite or phone company that might offer you hundreds of channels you'll never watch, you would be able to select what you want and watch it on your own schedule."

    -- CNET News - The Internet and the future of TV

    That describes Podly pretty well. We believe the future of TV is on the internet and when it makes that transition it should become something more than just a copy of how we use TV today. We must abandon the old model of network broadcasting that offers limited choices and embrace what the internet has to offer. Podly is our dream of how to make TV better and more useful to everyone.

    You should have your own channels customized to your interests. Would you like a channel about oil painting? Bicycle touring? How about cute puppies? Want something else? Just make a channel on any topic and Podly will fill it with interesting videos. It should be your choice-- not some network's choice.

    You shouldn't have to buy a digital video recorder to get control of your own schedule. We think that's crazy. The internet doesn't care about time slots. Live events like music concerts and sports will always be the exception but otherwise you should be able to watch at your own convenience.

    Podly is currently in private beta and is expected to be available in April, 2010. Podly will be free (ad-supported) with additional features available by subscription.

    Buzz Cruncher is an analytics tool for media creators, hosts and web sites. It helps you find answers to these questions:

    Creators: How popular am I? Who and where is my audience? How does my popularity change by geographic region and time? What other creators are popular with the same audience as mine?

    Hosts: Who is linking to my media? Which titles are most popular?

    Web sites: Is media helping me attract and retain visitors? How do I compare to other web sites?

    Buzz Cruncher will be available in March, 2010. Basic access will always be free. Upgraded reports will be available on a subscription basis. If there is enough interest we'll make a public API available.

  4. What IP addresses does MLBot crawl from?

    MLBot crawls from the following IP addresses:

    66.219.58.34
    66.219.58.41
    66.219.58.42
    66.219.58.43
    66.219.58.44
    66.219.58.45

    71.41.201.34
    71.41.201.35
    71.41.201.36
    71.41.201.37
    71.41.201.38
  5. Who are you guys?

    We're four guys and a cat! David Stafford, Jim Mischel, Joe Langeway, Ron Murray and Socks (he's the cat.)

    Metadata Labs is a small startup in Austin, Texas and our dream is to organize all the world's media and make it a lot more useful. Media on the internet is a bit of a mess today. It's disorganized and there's a painful lack of standards. We're working to clean it up and make it an easier and more enjoyable experience. Any media, any device, anywhere, anytime! We hope you'll like what we're building!

  6. I found a bug in MLBot!

    Ack! We try hard to make MLBot reliable and we definitely want to hear about anything that's causing you a problem. Please check the Latest news section for updates on recent bug fixes.

    The most common question we get is, "I'm seeing multiple HTTP requests for the same mp3 files in my logs. Why are you downloading them multiple times?"

    We respect your bandwidth and we do not download entire media files from your server. We access only small file segments that are likely to contain metadata (at the beginning and end of the file.) Each separate HTTP request will appear in your server log. We're trying hard to keep bandwidth usage to a minimum and it's better to make multiple accesses than one big download.

    If you notice any unusual activity from MLBot you can reach us at:

    We want to hear from you and we will respond promptly. Your feedback has helped us to improve MLBot.

  7. How do I block MLBot?

    If you have a problem or concern about MLBot we much prefer to have the chance to address it but if you need to block MLBot we do respect the robots.txt exclusion list. To block MLBot from some parts of your web site you can use the following example:

    User-agent: MLBot
    Disallow: /upload_dir/
    Disallow: /draft_podcasts/

    In this example, /upload_dir/ and /draft_podcasts/ are directories that will be blocked to MLBot and won't be crawled. Other parts of your web site will still be crawled.

    To block MLBot from your entire web site you can use this:

    User-agent: MLBot
    Disallow: /

    Please note, our web crawler caches robots.txt files and it can take 24 hours before any changes you make will take effect.

    More information on robots.txt can be found at http://www.robotstxt.org

  8. Is MLBot available for licensing?

    No. We're 100% focused on Podly and Buzz Cruncher and doing anything else would take our attention away from getting these products done.

  9. Latest news

    • 2010/02/05 - Podly has been delayed until April. Our testers told us what we needed (if not wanted) to hear. The product is usable and functions but the interface design is earning less-than-rave reviews like "boring", "feels too much like work", and "should be more like TV." We're grateful for your honest feedback. There is no quick fix for the current design so we're scrapping it and starting over (not the entire product-- just the interface design.) It will set us back a few months. We're very encouraged by what we see in our first prototype of the next interface and we hope you'll agree it's worth the delay.

    • 2009/11/19 - Socks the Cat has been relieved of all responsibility for carrier pigeon communications effective immediately. In hindsight, it should have been obvious that putting a cat in charge of carrier pigeons would eventually lead to disaster. We regret the mistake and offer our deepest apologies to Acme Carrier Pigeon Corp.

    • 2009/09/21 - Fixed a bug where, under certain conditions on Apache web servers delivering an HTML directory listing, we would append an incorrect query string to the url in our HTTP request. A recent crawler update attempted to minimize crawling of directory listings by avoiding selecting the "Name", "Last Modified", "Size", and "Description" fields that simply return the same directory listing in a different order. These fields were correctly removed from the url by the crawler, however, any query string that was left over from a previously-crawled url would be appended. The resulting url was bizarre but still functional as the query string is ignored by Apache. It's fixed and we appreciate the bug report!

  10. BBQ or Tex-Mex?

    Some of life's biggest questions have no answer but if you're in Austin drop by and we'll show you the best for both!

    Mesa Rosa - Austin's finest Tex-Mex
    Rudy's - Real Texas BBQ

  11. I have a question your FAQ doesn't answer.

    We'd love to hear from you! Feel free to write to us here: