Abstract
Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10032-016-0260-8/MediaObjects/10032_2016_260_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10032-016-0260-8/MediaObjects/10032_2016_260_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10032-016-0260-8/MediaObjects/10032_2016_260_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10032-016-0260-8/MediaObjects/10032_2016_260_Fig4_HTML.gif)
Similar content being viewed by others
Notes
Embedded within this competition was a social science experiment to investigate different team formation mechanisms. Two treatments were implemented. In treatment one, teams were formed through bilateral agreement between participants after communicating through a public forum or private direct messaging (this was termed the ‘free-form’ treatment). In the second treatment, teams were formed based on a stable-matching algorithm using participants’ stated preferences (termed ‘algorithm’ treatment). We found no significant differences in algorithm performance between the two treatments. The exact details of the social science experiment are beyond the scope of this paper. Some preliminary results can be found in this working paper http://goo.gl/NjoWce.
The final ranking of all submissions is publicly available on the TopCoder Web site at https://community.topcoder.com/longcontest/stats/?&sr=1&nr=50&module=ViewOverview&rd=15027.
References
Archak, N.: Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on topcoder.com. In: Proceeding of the International Conference World Wide Web, pp. 21–30 (2010)
Barney Smith, E., Belaid, A., Kise, K. (eds.): Proceedings of the International Conference Document Analysis and Recognition. IEEE Computer Society, Washington, DC (2013)
Bhatti, N., Hanbury, A.: Image search in patents: a review. Int. J. Doc. Anal. Recognit. 16(4), 309–329 (2013)
Blumenstein, M., Pal, U., Uchida, S. (eds.): Proceedings of the International Work. Document Analysis Systems. IEEE Computer Society, Gold Coast, Australia (2012)
Boudreau, K.J., Lacetera, N., Lakhani, K.R.: Incentives and problem uncertainty in innovation contests: an empirical analysis. Manag. Sci. 57(5), 843–863 (2011)
Boudreau, K.J., Lakhani, K.R.: Using the crowd as an innovation partner. Harv. Bus. Rev. 91(4), 61–69 (2013)
Bukhari, S.S., Shafait, F., Breuel, T.M.: Coupled snakelets for curled text-line segmentation from warped document images. Int. J. Doc. Anal. Recognit. 16(1), 33–53 (2013)
Casey, R., Lecolinet, E.: Strategies in character segmentation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)
Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., Ng, A.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 440–445. Bei**g, China (2011)
Do, T.H., Tabbone, S., Ramos-Terrades, O.: Text/graphic separation using a sparse representation with multi-learned dictionaries. In: Proceedings of the International Conference Pattern Recognition, pp. 689–692. Tsukuba, Japan (2012)
Doermann, D., Tombre, K. (eds.): Handbook of Document Image Processing and Recognition, vol. 2. Springer, New York (2014)
Dori, D., Wenyin, L.: Automated CAD conversion with the machine drawing understanding system: concepts, algorithms, and performance. IEEE Trans. Syst. Man Cybern. A 29(4), 411–416 (1999)
D’Ulizia, A., Ferri, F., Grifoni, P.: A survey of grammatical inference methods for natural language learning. Artif. Intell. Rev. 36(1), 1–27 (2011)
Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)
Fletcher, L., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)
Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.): Information Access Evaluation. Multilinguality, Multimodality, and Visualization—4th International Conference of the CLEF Initiative, Lecture Notes in Computer Science, vol. 8138. Springer, Valencia (2013)
Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part I. IEEE Trans. Syst. Man Cybern. 5(1), 95–111 (1975)
Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part II. IEEE Trans. Syst. Man Cybern. 5(4), 409–423 (1975)
Fullerton, R.L., McAfee, R.P.: Auctioning entry into tournaments. J. Polit. Econ. 107(3), 573–605 (1999)
Gobeill, J., Teodoro, D., Pasche, E., Ruch, P.: Report on the TREC 2009 experiments: chemical IR track. In: Text Retrieval Conference (TREC’09) (2009)
Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown, New York (2008)
Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)
Kalil, T., Sturm, R.: Congress grants broad prize authority to all federal agencies (2010). http://wh.gov/OSw
Kanungo, T., Haralick, R., Dori, D.: Understanding engineering drawings: a survey. In: Proceedings of Work. Graphics Recognition, pp. 217–228 (1995)
Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1485–1490 (2011)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1484–1493 (2013)
Koo, H., Kim, D., et al.: Scene text detection via connected component clustering and non-text filtering. IEEE Transaction Image Processing, pp. 2296–2305 (2013)
Lai, C., Kasturi, R.: Detection of dimension sets in engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 16(8), 848–855 (1994)
Lakhani, K.R., Boudreau, K.J., Loh, P.R., Backstrom, L., Baldwin, C., Lonstein, E., Lydon, M., MacCormack, A., Arnaout, Ra, Guinan, E.C.: Prize-based contests can provide solutions to computational biology problems. Nat. Biotechnol. 31(2), 108–111 (2013)
Lamiroy, B., Lopresti, D.: An open architecture for end-to-end document analysis benchmarking. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 42–47. Bei**g, China (2011)
Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)
Lu, T., Tai, C.L., Yang, H., Cai, S.: A novel knowledge-based system for interpreting complex engineering drawings: theory, representation, and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1444–1457 (2009)
Lu, Z.: Detection of text regions from digital engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 431–439 (1998)
Lupu, M., Hanbury, A.: Patent retrieval. Found. Trends Inf. Retr. 7(1), 1–97 (2013)
Mervis, J.: Agencies rally to tackle big data. Science 336(6077), 22 (2012)
Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)
Nagy, G., Embley, D.W., Krishnamoorthy, M.S., Seth, S.C.: Clustering header categories extracted from web tables. In: Ringger, E.K., Lamiroy, B. (eds.) Document Recognition and Retrieval XXII, Proceedings of SPIE, vol. 9402, p. 94020M. San Francisco (2015)
Nelson, R.R.: Uncertainty, learning, and the economics of parallel research and development efforts. Rev. Econ. Stat. 43(4), 351–364 (1961)
Niemeijer, M., Van Ginneken, B., Cree, M., Mizutani, A., Quellec, G., Sanchez, C., Zhang, B., Hornero, R., Lamard, M., Muramatsu, C.: Others: retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs. IEEE Trans. Med. Imaging 29(1), 185–195 (2010)
Ogier, J.M. (ed.): Proceedings of the International Work. Graphics Recognition (GREC 2013), Lecture Notes in Computer Science, vol. 8746. Springer, Bethlehem, PA (2014)
O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)
Rice, S.V., Jenkins, F.R., Nartker, T.A.: The fifth annual test of OCR accuracy. Information Science Research Institute (1996)
Rice, S.V., Nagy, G.L., Nartker, T.A.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer, New York (1999)
Roy, P.P., Pal, U., Lladós, J.: Text line extraction in graphical documents using background and foreground information. Int. J. Doc. Anal. Recognit. 15(3), 227–241 (2012)
Rusiñol, M., de las Heras, L., Ramos, O.: Flowchart recognition for non-textual information retrieval in patent search. Inf. Retr. 17(5–6), 545–562 (2014)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)
Sadawi, N.M., Sexton, A.P., Sorge, V.: Performance of MolRec at TREC 2011—overview and analysis of results. In: The Twentieth Text REtrieval Conference Proceedings (TREC). National Institute of Standards and Technology (NIST), USA (2011)
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of International Conference Document Analysis and Recognition, pp. 1491–1496 (2011)
Simon, H., Newell, A.: Computer simulation of human thinking and problem solving. Monogr. Soc. Res. Child Behav. 27, 137–150 (1962)
Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference Document Analysis and Recognition, vol. 2, pp. 629–633. Curitiba, Brazil (2007)
Sturm, R.: New center for excellence fuels prize to help modernize tools for patent examination (2011). http://wh.gov/DdM
Tassey, G., Rowe, B.R., Wood, D.W., Link, A.N., Simoni, D.A.: Economic impact assessment of NIST’s text REtrieval conference (TREC) program. National Institute of Standards and Technology (2010)
Terwiesch, C., Ulrich, K.T.: Innovation Tournaments: Creating and Selecting Exceptional Opportunities. Harvard Business Press, Boston (2009)
Terwiesch, C., Xu, Y.: Innovation contests, open innovation, and multiagent problem solving. Manag. Sci. 54(9), 1529–1543 (2008)
Tombre, K., Tabbone, S., Pélissier, L., Lamiroy, B., Dosch, P.: Text/graphics separation revisited. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) Document Analysis Systems, Lecture Notes in Computer Science, vol. 2423, pp. 200–211. Springer, Berlin (2002)
Valveny, E., Lamiroy, B.: Scan-to-XML: automatic generation of browsable technical documents. In: Proceedings of the International Conference Pattern Recognition, vol. 3, pp. 188–191. Québec City, Canada (2002)
Wagner, R., Fischer, M.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Wendling, L., Tabbone, S.: A new way to detect arrows in line drawings. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 935–941 (2004)
Wu, V., Manmatha, R., Riseman, E.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1224–1229 (1999)
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recognit. 7(1), 1–16 (2004)
Zanibbi, R., Blostein, D., Cordy, J.R.: White-box evaluation of computer vision algorithms through explicit decision-making. Computer Vision Systems. Lecture Notes in Computer Science, vol. 5815, pp. 295–304. Springer, Liège, Belgium (2009)
Zheng, Y., Li, H., Doermann, D.: Machine printed text and handwriting identification in noisy document images. IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 337–353 (2004)
Zhou, W., Li, H., Lu, Y., Tian, Q.: Principal visual word discovery for automatic license plate detection. IEEE Trans. Image Process. 21(9), 4269–4279 (2012)
Zhu, S., Zanibbi, R.: Label detection and recognition for USPTO images using convolutional k-means feature quantization and AdaBoost. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 633–637. Washington, DC (2013)
Acknowledgments
We are grateful for helpful comments provided by Ahmad Ahmad and the anonymous reviewers. This research was supported in part by the NASA Tournament Laboratory and the United States Patent and Trademark Office (USPTO).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Riedl, C., Zanibbi, R., Hearst, M.A. et al. Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms. IJDAR 19, 155–172 (2016). https://doi.org/10.1007/s10032-016-0260-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-016-0260-8