11 Apache Technologies for the Enterprise
Now more than 10 years old, the Apache Software Foundation has become a repository of some of the most important open source technologies. HadoopThe Hadoop project attracts buzz because it mimics the style of distributed computing used internally at Google, using technologies cloned on the basis of academic papers published by Google engineers. Using the MapReduce style of programming, developers write data processing routines that fan out across clusters of computers and feed back their results. This turns out to be an efficient way of performing data intensive tasks on commodity hardware. HBaseA subproject of Hadoop, HBase is another clone of a Google technology, this one known as BigTable, that is used to manage very large database tables — up to billions of rows, millions of columns — with the data stored on clusters of commodity servers. This is structured data storage and analysis, just not according to relational database rules. HBase provides a BigTable-like solution that runs on top of Hadoop. CassandraCassandra is Facebook’s contribution to the field of Big Data management and analysis. Originally invented to manage Facebook® user account data, the code was contributed to the Apache project in 2008 and is now maintained and refined by participants from many companies. Cassandra adopted some of the concepts from Google’s BigTable as well as published details on Amazon.com’s Dynamo distributed computing technologies. CouchDBAnother nonrelational database, designed for easy replication across many nodes and data access via a REST API, meaning that documents and records are posted to and retrieved from the database over the Web’s HTTP protocol. BBC is one organization that has talked about using CouchDB in combination with Apache Tomcat to build a cost-effective content management system that can be replicated across data centers. Lucene and SolrLucene is an umbrella project for developing open source search software, including the Lucene Java library and the Lucene.NET port to C#. Solr is a high-performance implementation of Lucene Java that has been adopted by organizations such as MTV Networks for search applications on their Web sites. NutchLundy points to Nutch as an affordable alternative to purchasing a Google™ Search Appliance. Originally a sub-project of Lucene (as Solr is now), Nutch was reclassified as an Apache top-level project this year in recognition of its growing maturity. Building on top of Lucene, Nutch adds facilities for crawling, parsing, and indexing web documents. TomcatA free alternative to Java application servers, particularly for situations where basic Java Servlet and Java Server Pages technologies are required and heavy-duty Java Enterprise Edition technologies would be overkill. Tomcat can also be used as the front end to more sophisticated back-end Java technologies. StrutsA web application framework that extends the Java Servlet API to support a model view controller (MVC) programming model. In other words, it provides a mechanism for enforcing a clean separation between the presentation of an application (the user interface) and the logic behind the application with the goal of simplifying maintenance of the code. GeronimoApache Geronimo pulls together many open source Java alternatives to produce a fully certified Java Enterprise Edition 5 application server. Axis2Axis2 is part of the Apache Web Services project. A Web Services engine for the SOAP and WSDL protocols for distributed invocation of services via XML messaging, as well as REST. The primary implementation is in Java, although a port for C is also available. Apache HTTPWhile perhaps less glamorous than some of its distributed computing brethren, the Apache web server remains the workhorse of the web, powering some 70 percent of all Web sites and enabling many applications through its extensions for Perl and PHP programming. |






