February 10th, 2006
While not too much to introduce today I did want to give an update that our new & upcoming features are coming soon (Blog, Creative Commons Search and other features), just a short delay to try and roll over to our new datacenter so we can improve on speed & scalability.
I’m recovering from a stint with the Flu and trying to get caught up on everything and will post as soon as we release the new features.
How do you feel about RMS discounting creative commons? Did you know this weekend we were able to build a test release of mozdex in GCJ (as discussed in our other blog entries?) . Progress is being made
Posted in Mozdex | 33 Comments »
February 5th, 2006
This weekend we will be rolling out a Creative Commons, Blog, Image and News Search. You will see the new tabs on the home page as well as direct access bookmarks to the feeds available externally.
We are also adding other vertical searches such as Gov sites (.gov only domains), Health (Department of Health) and some other features we have up our sleeves.
On top of these we may have some sporadic downtime as we roll out updates to finally notice your language and default to your language for results so it will look better (and be potentially less “spammy” since there is an obscene amount of non us sites that just exist to spam indexes for ads).
Keep your eyes peeled!
Posted in Mozdex | 14 Comments »
February 3rd, 2006
Today we implemented OSCache caching on our query servers. We have noticed an improvement in page rendering performance as well as a lower overall cpu utilization on our query servers. Hopefully this is one step closer to building a highly available cache solution to make sure we can spend more of our processing power on optimizing & building our index.
OSCache is a simple caching mecanism that can be implemented through URL rules or JSP tag libraries. The Cache allows us to setup a cluster pool that all query servers share to offload index reads which are disk intensive. We are continiung to calibrate the TTL and rules to enhance performance and look forward to seeing how this handles under load. More to come on a future update
For those who frequently visit we added the ads back in - please let us know what you think. (ugly, to obtrusive or just right??)
Posted in Mozdex | 3 Comments »
February 2nd, 2006
I’ve been asked to do more updates so I figure I would simply start a daily update, aka “Daily Wire”, to give the ins and outs of what is happening at mozdex.
Today we are busy building out the new servers. We are migrating to X86-64, testing out & benchmarking JDK 1.5 and integrating some caching mechanisms to help offload the query servers. Anyone know of a good open source Java profiler? BTW, with initial testing we have gone from a luke warm 9-11 queries per second on 32bit jvm (1.4) to over 100+ queries per second on x64 / java 1.5. We hope to continue to improve this so multi-term queries will improve in result speed. We will also work on adding the result speed to our pages so we can track how fast they were processed.
Interesting Stats:
* Over 100k OpenSearch (xml) queries per day (and growing quickly)
* “Linux” is still #1 search term. I guess we have lots of linux fans
* “Wholesale Products” is #1 OpenSearch term.
Did you know you could search mozdex.com from a9.com? Visit a9.com and add mozdex as an open search feed!
Posted in Mozdex | 17 Comments »
February 1st, 2006
New datacenter coming up this month! We will be shooting for 400 million pages and implementing our search on some new hardware. We have adopted the new dual core Opteron platform, switched to Java 1.5 64bit JVM and are implementing some new redundancy features to help streamline and provide high availability. We will also separate out some of our XML/RSS feeds and API systems to its own network to provide better availability for our search partners & feeds.
Internet Explorer 7 preview is now available for public consumption and through my first trials I must admit I am pleasantly surprised. Tabbed browsing, page previews, tab history, fast rendering and the amazing OpenSearch/RSS/XML support that you can see by clicking on our XML feeds (bottom of search results) or by adding Mozdex.com (click the Add to IE 7 link from our support page) to your IE 7 search bar. It will be interesting to see how Firefox counters this release with firefox 2.0 (as well as Opera 9).
Posted in Mozdex | No Comments »
January 30th, 2006
I have been thinking about new ways to build a marketing system that is affordable, secure, reliable and most importantly trustworthy and I’m looking for your feedback and thoughts.
I want something that is advertiser driven, cost effective and of course profitable. Is CPM based advertising the most trustworthy? Is there a system that can be built that works without invading ones privacy?
How about going back to the CPM base? Allow people to bid either per 1k impression or per day/week/month of advertising as well as rank/priority of advertising (position)? I have a feeling the PPC market as we know it is destined for failure because of potential fraud, trust and privacy issues related to the data collected. Something along the lines of CPM advertising would be less risky in that it would take more brute force cheating to make a difference and that could throw off flags that are easier to detect.
I have temporarily removed Google adsense until I can come up with something that works. I have contemplated selling keywords through Adbrite or other systems but again I’m not to keen on 3rd party privacy policies and what they do with the data they collect without my knowledge or consent. Advertising is a source of revenue we need to build a successful and sustainable search engine - what is your feedback on making this possible while sticking to our guns on being open and trusted?
Posted in Mozdex | 27 Comments »
January 23rd, 2006
I believe that the discussion of privacy today is more a freedom of speech issue than an actual stand for “privacy”. The fear of searchers is that searching for terms they may be intrigued about would be flagged as something the government wouldn’t want you to search for.
Our government has already told us they are monitoring our phone calls and spying internally in the name of anti-terrorism. Why is your concern search term privacy when our government has already over stepped the lines of privacy, liberty and personal freedoms? Why are we cheerleading the likes of Google when they have made no public statement as to the protection, privacy and use of the data they collect? The real issue isn’t just corporate privacy and keeping your searches private but government overstepping its boundaries and threatening the very freedoms we all enjoy.
I want statements, I want action, I want people to be held accountable for what they are expected to do. If Google collects personal data I want to be able to delete it, if the government wants to see this data then I want to make sure my freedom of speech protects me. We can’t assume anything anymore and the problem with privacy is we assume it is protected. We have to stand up for something and I hope the giants with all the power they have can make a stand instead of simply ride the free publicity and pull a stunt.
Mozdex collects no personal information. We hold our query lists for 30 days to track topN terms and we analyze our logfiles with standard processes to track our growth and utilization. Because we use Adsense there are cookies associated with that which are out of our control. Perhaps a new model of advertising/revenue generation without all of the fraud tracking associated with cookies & adsense can alleviate that last barrier to a successful commercial search engine and that is what I hope to open up to in further discussions.
Please keep in mind that all search engines show your queries in one way or another as they will be visable to webmasters/publishers in referral logs and some even offer services to track and monitor click throughs and ranks on there own site based on query terms. Again, the right of privacy is something we have to stand for not just commercially but politically and that takes action from us all.
What are your privacy concerns? How do non US based governments handle such concerns? How does one protect query privacy when the information is readily exchanged on the basic http levels that all webservers use to communicate with each other?
Posted in Mozdex | 8 Comments »
January 17th, 2006
We are working on implementing a 48 hour URL submission/inclusion process so that any urls submitted will be crawled, indexed and searchable within 48 hours (or less). I expect to announce this feature fully over the next few days but it should be implemented over the next few hours. Feel free to submit your sites/content pages and see how they get inserted. PS, we’re also working on sucking down sitemaps.. more on that over the next few days.
Posted in Mozdex | 6 Comments »
January 12th, 2006
How should an “open” index deal with spam? Should we have an internal blacklist of obvious sites? Should we publish a blacklist? Should users be able to vote/mark or submit links that look like spam? We are working on automated processes to control the obvious culprits but what would you like to see? Would you like a login so you can click “Ignore” and never see links again and would you like that ignore to block specific results or run a regex expression to filter out things based on site/url/description or other pieces of the index? Please let us know your thoughts on link spam and abuse as well as drop a reply on how our index is improving (or getting worse) by our current spam cleanup processes.
Posted in Mozdex | 17 Comments »
January 6th, 2006
For those on the east coast who are available the weekend of January 14-15th I plan to attend and give a small presentation on Mozdex.com and Open Source Search. For more information on BarCamp Click Here!
Posted in Mozdex | 1 Comment »