How New Tech Can Help Lawyers Rethink Their Jobs In The Big Data Age

By Derrick Harris. This article originally appeared on Gigaom, December 31, 2013.

The legal profession is inherently conservative when it comes to adopting new technologies and practices, but firms and lawyers that want to stand out in an evolving field might want to jump on the big data bandwagon sooner rather than later.

Law firms took a beating during the peak of the recession a few years ago — large firms, especially, laid off staff and scaled back significantly on hiring — and many argue the profession will never be the same. Clients worried about their own finances aren’t as keen on forking over huge hourly fees as teams of associates and partners work their cases. The business model of law is evolving pretty rapidly, from flat-rate fee structures and on-demand legal advice to the democratization of certain services via companies like LegalZoom.

One could argue the best law firms going forward will be lean and mean, eschewing unnecessary costs and cutting out institutional inefficiencies. They’ll do more — and better — work in less time with smaller staffs. But first they might need to reconsider legacy methods of building cases and forming legal strategies, and start thinking about all the data around them. If it’s digital, it can be analyzed, which might mean finding the right information a lot faster and for a lot less money than previously possible.

Here are three ways law firms and lawyers can get started rethinking their processes with big data now.

1. Automate, automate, automate

Thus far, the biggest area where big data is impacting the legal profession might be in intelligent software for helping companies get a handle on electronic discovery. Companies often store years worth of emails, PDFs and every other type of document under the sun, which might be great for regulatory purposes but is a nightmare for the lawyers that have to sort through them when they’re turned over as part of the discovery process. But times are changing, and computers can handle a lot of document-review grunt work that used to be handed off to associates or farmed out to contract attorneys.

Last year, for example, we wrote about a software vendor called Recommind that uses machine learning to do what it calls predictive coding, a process that saves firms time and money by helping lawyers sort through all those files to figure out which ones are relevant. (The company’s CTO, Jan Puzicha, also joined me on a panel at Structure Data last year to talk about the importance of keeping humans in the loop even when automating parts of the process with machine learning.) We’ve covered another company, PureDiscovery, that applies semantic analysis techniques to e-discovery documents in order to achieve largely the same result.

These companies are expanding their capabilities, too. Now, rather than just trying to determine the relevancy of any given document, they’re beginning to let users investigate the links between the people, topics, timelines and other information contained in those documents.

2. Don’t fear numbers

Lawsuits generate a lot of documents, but the data they contain isn’t valuable only for its use as evidence. Lex Machina is a startup that aims to give intellectual property attorneys statistical data that could help them make better decisions about their cases. Attorneys don’t have to rely solely on what Lex Machina CEO Josh Becker calls “anecdata” — you know, statements like “This judge is tough on defendants,” “Cases involving this type of claim are a slam dunk” or “This lawyer has the most experience on this technology” — because they can actually see numbers that speak to those exact issues.

It’s a potentially powerful tool (I received a demonstration of it in November at the company’s Menlo Park, Calif., headquarters) made all the more interesting because of its data source: all the filings available in the PACER federal court records database. The data was always there, but it was pretty much a collection of individual documents disconnected from the bigger picture. Now, thanks to machine learning, natural language processing and a variety of other big data techniques that have come of age in the past few years, it has taken the shape of sortable, analyzable information. Words have become part of a massive index and outcomes have become part of a collective intelligence.

(Lex Machina’s history is actually really interesting, involving a nonprofit project called the Stanford IP Law Clearinghouse and the recruiment of Stanford machine learning and NLP experts Andrew Ng and Christopher Manning to help build the technology. Its vice president of product, Karl Harris, is a former vice president of engineering at Flurry and Stanford Law grad. IEEE Spectrum did a good profile on company when it launched its commercial service in late October.)

Lex Machina supplies attorneys with nearly any type of information they could want about patent lawsuits — case outcomes, lawsuit durations, judges, disputed patents, parties, attorneys — via a rather intuitive web service. Results can be summarized over time (e.g., how frequently cases in front of a particular judge result in settlements, or how often a particular patent is litigated and the total amount of damages it has generated), or users can drill down as deep as they want — even right down to the pleadings and court filings.

3. Get creative and get their hands dirty

However, rather than wait around for more technology companies to create pricey products targeting law firms, lawyers could just get creative with the data they have available to them. The “big” part of big data gets a lot of attention, but for most industries and companies — law firms included — the variety part is probably the most important aspect. Data isn’t just about numbers anymore. Our Structure Dataconference in March is focused on just this idea — that every document, social media post, photo, video, website, and pretty much anything is now a source of data just waiting to be analyzed and turned into information.

For example, people do a lot of talking on social media today, so maybe a lawyer could use something like ScraperWiki to download a witness’s Twitter connections and activity (check out what I’ve done with it herehere and here). There are free tools like etcML (and paid services like AlchemyAPI) that can analyze any type of text file, be it tweets or email logs, to determine sentiment or extract key concepts.

And even for more-traditional numerical data (say, for example, a record of car accidents and locations that might be relevant to a personal injury case) there is no shortage of easy tools available to help analyze and visualize it. Tools like import.io make it easy to actually extract data from websites (say, the changes in price for real estate listings over time) and turn it into tables.

The best part is that many of the new tools for analyzing data are relatively inexpensive (often free) and designed for ease of use (there’s no WestLaw-like learning curve). The results might not produce a smoking gun — they might not even be admissible as evidence — but they’re a lot faster than conducting a deposition or requesting more documents just to investigate a hunch or find a new angle to pursue.

There are no more gilded professions, and law is undergoing the same type of shift as medicine, manufacturing and even mobile applications. The people paying the bills want stuff done well, fast and at the lowest-possible cost, and they’ll look for the providers who can meet those expectations. Understanding how to use data is fast becoming one of the key capabilities of the ones who win.